0

Possible Duplicate:
How can I download an entire website

I frequently encounter webpages that offer manual pages or other info accessible only via a table of contents consisting of links to individual chapters or paragraphs. Often the individual leaf pages then consist of a few lines only, so traversing the entire tree is extremely cumbersome.

What I am seeking is a tool that would allow me to pull and combine all pages referenced by the links of a starting page into a single concatenated html document, such that one could e.g., save that page and/or linearly scroll through all child pages without having to click and go back 1000 times. This would also allow to print the entire collection to have a manual or search through it in one go, etc.

Does anyone know a good tool to achieve that? Ideally such a tool would offer some exclusion criteria (like ignore all "back" links or the link to help or home pages that is found on each page, etc.).

2 Answers2

1

You could use wget in mirror mode:

C:\MySites\> wget -m http://mymanuals.com/manuals/foobar

Would mirror the whole http://mymanuals.com/manuals/foobar site.

The other thing I have used with quite good success is HTTrack which again mirrors a website for you, but with a nice GUI front-end.

Majenko
  • 32,964
0

wget to get all the pages. You could use xhtml2pdf and pdftk to create a single document.

l0b0
  • 7,453