How can I download an entire website?

Question

How can I download all pages from a website?

Any platform is fine.

score 440 · Accepted Answer · edited May 06 '22 at 02:05

440

HTTRACK works like a champ for copying the contents of an entire site. This tool can even grab the pieces needed to make a website with active code content work offline. I am amazed at the stuff it can replicate offline.

This program will do all you require of it.

Happy hunting!

edited May 06 '22 at 02:05

Santropedro

739

answered Jul 28 '09 at 13:55

Axxmasterr

7,966

score 354 · Answer 2 · edited Feb 09 '23 at 13:56

Wget is a classic command-line tool for this kind of task. It comes with most Unix/Linux systems, and you can get it for Windows too. On a Mac, Homebrew is the easiest way to install it (brew install wget).

You'd do something like:

wget -r --no-parent http://example.com/songs/

For more details, see Wget Manual and its examples, or e.g. these:

score 239 · Answer 3 · 2020-07-08T20:16:39.107

Use wget:

wget -m -p -E -k www.example.com

The options explained:

-m, --mirror            Turns on recursion and time-stamping, sets infinite 
                          recursion depth, and keeps FTP directory listings.
-p, --page-requisites   Get all images, etc. needed to display HTML page.
-E, --adjust-extension  Save HTML/CSS files with .html/.css extensions.
-k, --convert-links     Make links in downloaded HTML point to local files.
-np, --no-parent        Don't ascend to the parent directory when retrieving 
                        recursively. This guarantees that only the files below 
                        a certain hierarchy will be downloaded. Requires a slash 
                        at the end of the directory, e.g. example.com/foo/.

score 8 · Answer 4 · edited Aug 16 '11 at 08:09

Internet Download Manager has a Site Grabber utility with a lot of options - which lets you completely download any website you want, the way you want it.

You can set the limit on the size of the pages/files to download
You can set the number of branch sites to visit
You can change the way scripts/popups/duplicates behave
You can specify a domain, only under that domain all the pages/files meeting the required settings will be downloaded
The links can be converted to offline links for browsing
You have templates which let you choose the above settings for you

enter image description here

The software is not free however - see if it suits your needs, use the evaluation version.

score 7 · Answer 5 · answered Sep 17 '09 at 02:08

7

I like Offline Explorer.
It's a shareware, but it's very good and easy to use.

answered Sep 17 '09 at 02:08

Eran

3,479

score 6 · Answer 6 · answered Nov 03 '17 at 18:13

Power wget

While wget was already mentioned this resource and command line was so seamless I thought it deserved mention: wget -P /path/to/destination/directory/ -mpck --user-agent="" -e robots=off --wait 1 -E https://www.example.com/

See this code explained on explainshell

score 6 · Answer 7 · edited Aug 16 '11 at 08:07

6

You should take a look at ScrapBook, a Firefox extension. It has an in-depth capture mode.

enter image description here

edited Aug 16 '11 at 08:07

Gareth

19,080

answered Sep 16 '09 at 22:12

webjunkie

121

score 4 · Answer 8 · edited Oct 23 '13 at 12:57

4

Teleport Pro is another free solution that will copy down any and all files from whatever your target is (also has a paid version which will allow you to pull more pages of content).

edited Oct 23 '13 at 12:57

Ashildr

2,770
5
28
45

answered Mar 21 '13 at 17:14

Pretzel

466

Henke · Answer 9 · 2024-01-28T17:11:27.713

How can I download an entire website?

Here is an example on how to download not an entire website, but just a subdomain, including all its subdomains :

wget -E -k -m -np -p https://www.mikedane.com/web-development/html/

This worked just fine. ¹

Apparently, this doesn't always get all the subdomains or PDFs, but it did get a fully functional copy that works fine offline.

Here are the meanings of the flags used, according to the Linux man page : ²

-E – will cause the suffix .html to be appended to the local filename
-k – converts the links to make them suitable for local viewing
-m – turns on recursion and time-stamping, infinite recursion depth
-np – only the files below a certain hierarchy will be downloaded
-p – download all files necessary to properly display the pages

Reference

wget(1) - Linux man page

^{¹
If you try it, expect the download to be about 793 KiB.

In a previous version, I had index.html at the end of the URL.
This is unnecessary. It might even make the download fail.
But the ending forward slash, /, should not be left out.
²
Concerning the -np flag, the exception is when there are dependencies outside the hierarchy.

For example, I made a download for which the referred CSS files are in a different subdomain.

Yet, the subdomain that has the CSS files was also downloaded, which is what we want, of course.}

score 2 · Answer 10 · edited Aug 16 '11 at 08:06

Try BackStreet Browser.

It is a free, powerful offline browser. A high-speed, multi-threading website download and viewing program. By making multiple simultaneous server requests, BackStreet Browser can quickly download entire website or part of a site including HTML, graphics, Java Applets, sound and other user definable files, and saves all the files in your hard drive, either in their native format, or as a compressed ZIP file and view offline.

enter image description here

Ivan · Answer 11 · 2016-05-27T13:45:20.940

For Linux and OS X: I wrote grab-site for archiving entire websites to WARC files. These WARC files can be browsed or extracted. grab-site lets you control which URLs to skip using regular expressions, and these can be changed when the crawl is running. It also comes with an extensive set of defaults for ignoring junk URLs.

There is a web dashboard for monitoring crawls, as well as additional options for skipping video content or responses over a certain size.

score 1 · Answer 12 · edited Feb 09 '23 at 13:54

1

You can use below free online tools which will make a zip file of all contents included in that URL:

edited Feb 09 '23 at 13:54

Amazon Dies In Darkness

9,990

answered Nov 11 '19 at 16:33

GorvGoyl

247

score 1 · Answer 13 · answered Oct 06 '20 at 01:29

1

Cyotek WebCopy seems to be also a good alternative. For my situation, trying to download a DokuWiki site, it currently seems to lack support for CSRF/SecurityToken. Thats why I actually went for Offline Explorer as stated already in answer above.

answered Oct 06 '20 at 01:29

Christoph Lösch

111

score 0 · Answer 14 · answered Nov 25 '20 at 08:59

A1 Website Download for Windows and Mac is yet another option. The tool has existed for nearly 15 years and has been continuously updated. It features separate crawl and download filtering options with each supporting pattern matching for "limit to" and "exclude".

score -1 · Answer 15 · edited Jun 12 '20 at 13:48

The venerable FreeDownloadManager.org has this feature too.

Free Download Manager has it in two forms in two forms: Site Explorer and Site Spider:

Site Explorer
Site Explorer lets you view the folders structure of a web site and easily download necessary files or folders.
HTML Spider
You can download whole web pages or even whole web sites with HTML Spider. The tool can be adjusted to download files with specified extensions only.

I find Site Explorer is useful to see which folders to include/exclude before you attempt attempt to download the whole site - especially when there is an entire forum hiding in the site that you don't want to download for example.

score -5 · Answer 16 · answered May 16 '15 at 18:05

I believe google chrome can do this on desktop devices, just go to the browser menu and click save webpage.

Also note that services like pocket may not actually save the website, and are thus susceptible to link rot.

Lastly note that copying the contents of a website may infringe on copyright, if it applies.

How can I download an entire website?

16 Answers16

Power wget

How can I download an entire website?

Reference

Linked

Related