-1

I am looking for links (not urls of pages) written in job postings on a particular website. I would like to scan the website and copy all strings beginning with http or www on ALL the pages (about 1000)

I am on windows 7. i dont know how to run scripts. Can anybody suggest a efficient way of doing this?

Would i have to first download all html pages? If so, then what software should i use for downloading and scanning and copying for the string?

M Singh
  • 21

3 Answers3

2

When we keep in mind that running scripts is not an option for you, you could take the approach to download the source code of a page (right click -> download source code). You then can open it with e.g. notepad and search the content by pressing [Ctrl] + [F].

Another way would be using the URLStringGrabber: http://www.nirsoft.net/utils/url_string_grabber.html

1

You can easly achieve that in Opera, just open left pane - Links, and you can copy all of them to clipboard

0

I find the program WinHTTrack to be useful for this purpose. There is a combination of options that allow you to download a single page, but change the URLs to a specific, absolute format so that you can later search the raw HTML and be guaranteed almost all of the links.

  1. After setting the mirror name and progressing to the next screen, change the Action to "Download web site(s)".
  2. Put the URL of the page that contains more web pages in the "Web Addresses: (URL)" box.
  3. Select Options -> Experts Only
  4. Change the "Rewrite Links: internal / external" to "Absolute URI / Absolute URL" (or, if you're only using the page for scraping URLs, "Absolute URL / Absolute URL").
  5. Press OK, then Next, then navigate through the options as usual.

More information about HTTrack can be found on the tag.

wizzwizz4
  • 749