6

I need some documentation about XUL but I do not have Internet access most of the time. So, I've tried to download the Mozilla Tutorial with the following command:

wget --no-parent -r -l 2 -p -k https://developer.mozilla.org/en/XUL_Tutorial

My intention was to download both the https://developer.mozilla.org/en/XUL_Tutorial page and its subpages (for example, https://developer.mozilla.org/en/XUL_Tutorial/Install_Scripts). However, even though I passed the --no-parent flag, it keeps getting pages such as https://developer.mozilla.org/index.php?title=Special:Userlogin&returntotitle=en%2FXUL+Tutorial%2FInstall+Scripts.

I do not understand why it happens. How could I achieve the behavior I intended?

studiohack
  • 13,477
brandizzi
  • 175

3 Answers3

20

You need the trailing slash at the end of the URL.

Dyax
  • 216
2

Was having a similar issue:

wget -r -l1 --no-parent -nH "https://www.website.com/parent/directory/"

I believe there was an issue with https vs. http. I updated $HOME/.wgetrc to:

header = Accept-Encoding: none
header = Accept-Language: en-us,en;q=0.5
header = Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
header = Connection: keep-alive
user_agent = Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20100101 Firefox/10.0.2
referer = http://www.google.com/
robots = off

Then changed changed https to http:

wget -r -l1 --no-parent -nH "http://www.website.com/parent/directory/"

The wget program no longer created folders (or retrieved files) from outside the specified directory hierarchy.

Dave Jarvis
  • 3,427
1

I had to disable gzip compression to make it work. I also changed the user-agent because some pages forbid wget. So this is what I've put into my .wgetrc:

header = Accept-Encoding: none

user_agent = Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6

Works great here.

Gareth
  • 19,080