How do I completely download a web page, while preserving its functionality?

Question

I've been trying to save this webpage using all of the methods that I know, but none of them have worked so far. The website itself has some great functionality. It is able to render MathJax in realtime, without any noticeable lag. I want to be able to use it offline, so I wanted to save it. I haven't been very successful. I'm on MacOS. Here is what I have tried so far:

Save as on Safari as a Web Archive (.webarchive) – doesn't preserve the page's functionality
Save as on Safari as Page Source (.html) – Completely messes the page up
HTTrack – doesn't preserve the webpage's functionality
Save as on Chrome as Webpage, Complete (.html) – messes up layout and functionality
WebDumper – gives me a "Forbidden" error
itsucks – messes webpage up
SiteSucker – messes webpage up
ScrapBook (firefox) – messes up
A couple of other things that I can't remember anymore.

I just want to save the website and be able to use it offline. I noticed something interesting, however. When I'm in Safari and I go offline, the webpage performs fine. This undoubtedly means that the webpage can run offline with no problem – I just need a way to save it properly. I suppose I could create a virtual machine, load up the site on it and then save it as a snapshot and use it whenever I want to offline, but that seems like quite a disproportionate solution for such a seemingly simple problem.

On a side note: would it be possible to save a webpage like this (iPhone 6S page) with all of the scrolling animations, embedded pictures and videos and all the rest? I've only tried creating a Web Archive using Safari, but it only saved the nice scrolling animation – not the embedded pictures and such.

ccpizza · Answer 1 · 2025-01-08T14:21:56.837

Browser add-on: Save Page WE Firefox / Chrome

A firefox/chrome add-on which is lighter than the web-recorder mentioned below, and which worked well for a subset of use cases. Configurable, flexible, and can optionally scroll pages in order to retrieve lazy-loaded content. It inlines images, scripts, fonts, etc as data-URLs producing a single big standalone HTML file.

Browser add-on: WebScrapBook Firefox / Chrome

A different approach to the same task. Can recursively download linked pages to a depth that you can specify. Highly configurable and can remove junk from pages via CSS selectors. Rewrites links to point to locally downloaded assets (and not data-URLs!). On the downside doesn't have the concept of a 'project' so you cannot update a previously downloaded site. File naming is tricky to get right but possible, if you set the right template tokens (requires carefully reading the ❓popup for each field). Needs a bit of fiddling with options until it will work the way you want. Can be used to save recursively multiple pages on sites that require authentication, i.e. 'webspider mode'.

Chrome extension and standalone app: WebRecorder

WebRecorder is a 'system for high-fidelity web archiving' https://github.com/webrecorder/archiveweb.page — it's an open-source project which offers a free Electron-based desktop app, and a chrome extension. Scroll down to Desktop Tools for the app download links (you don't need to create an account in order to use the desktop app).

https://github.com/webrecorder/webrecorder-desktop [⚠️apparently unmaintained]
Chrome Extension

NOTE: WebRecorder will not save the original markup of the page in an easily parseable format; it uses its own format for storing the data which does not make it easy to pull out the bits that you need in case you like tinkering with data.

^{Note: WebRecorder worked fine with a few medium-complexity sites, but did not save what was expected on some sites that are heavy on javascript, and built with frameworks such as react, angular, vue, etc (applies to all tools mentioned)}

Romen · Accepted Answer · 2019-12-10T22:55:03.840

It's not possible to do this with many websites these days. And for sites that seem like it's possible, it would still require some Javascript experience for reverse-engineering and "fixing" the scripts that are saved to your computer. There is no single method that works for all websites, you have to work through each unique problem for every site you try to save.

A lot of websites are no longer just static files that are sent from the server to your computer. They have become 2-way interactive applications, where the web browser is running code that continuously interacts with the web server from the same page.

When you load a website in a browser, you are seeing the "front end" of the entire system that makes up the website. This "front end" (including the HTML, Images, CSS, and Javascript) can even be dynamically generated by code on their end! Which means there is code executing on the server side that is not sent to your web browser, and that code may be critical to supporting the code that is sent to your web browser.

There is simply no way to "download" that server-side code, which is why many websites don't work properly when you save them.

The most common problem causing things to break is that websites use javascript to load content after the initial page response is sent to your browser. The HostMath site you are trying to save offline definitely uses a back-end to retrieve javascript files that are critical to the site's functionality. In Firefox I get this error for several different javascript files when I try to open the site locally:

Loading failed for the <script> with source “file:///D:/Home/Downloads/hostmath
/HostMath%20-%20Online%20LaTeX%20formula%20editor%20and%20browser-
based%20math%20equation%20editor_files/extensions/asciimath2jax.js?rev=2.6.0”

See that ?rev=2.6.0 after the filename? That is a parameter that is passed to the back-end (webserver) to determine which asciimath2jax.js file should be sent to your web browser. My D: drive isn't a web server, so when Firefox is trying to load a file with a URL parameter, it fails.

You could try downloading the file from HostMath manually and save it in the right location without the ?rev=2.6.0 though. Then you would need to change the site's scripts and HTML to load the file from your drive without a URL parameter. This would have to be done for all of those scripts that failed to load.

You will hit a dead-end if there is any Javascript that makes requests to a web service (an API) on the host website though. This would be done to off-load computation for something that the site doesn't compute locally in the web browser, which means the back-end is essential to running the front-end.

score 0 · Answer 3 · answered Sep 29 '19 at 09:46

Open the website that you want to shop. Any web browser can speedy shop the web site which you are currently visiting. ... Open the "Save web page as" window. ... Give the saved page a call. ... Select a vicinity to store the page. ... Select whether or not you need the entire web page or just the HTML. ... Open the saved webpage.

score -1 · Answer 4 · edited Aug 08 '23 at 22:41

-1

Clear the cache memory of the browser that you are using.
Open browser and go to the site u want to download.
Open the folder of the cache.
Download all files into the same folder where the index.html/xml/ (etc) is.
Go offline, and test the downloaded page.

edited Aug 08 '23 at 22:41

RokeJulianLockhart

887

answered Dec 10 '19 at 19:50

Alex Shandor

1

How do I completely download a web page, while preserving its functionality?

4 Answers4

Browser add-on: Save Page WE Firefox / Chrome

Browser add-on: WebScrapBook Firefox / Chrome

Chrome extension and standalone app: WebRecorder

Linked