0

I need a solution to export all hyperlinks on a webpage (on a webpage, not from entire website) and a way to specify the links I want to export, for example only hyperlinks starting with https://superuser.com/questions/ excluding everything else.
Exporting as text file preferred and the results should be displayed one below another, one URL per line:

https://superuser.com/questions/1  
https://superuser.com/questions/2  
https://superuser.com/questions/3
[...]
user198350
  • 4,269

2 Answers2

1

If you are running on a Linux or a Unix system (like FreeBSD or macOS), you can open a terminal session and run this command:

wget -O - http://example.com/webpage.htm | \
sed 's/href=/\nhref=/g' | \
grep href=\"http://specify.com | \
sed 's/.*href="//g;s/".*//g' > out.txt

In usual cases there may be multiple <a href> tags in one line, so you have to cut them first (the first sed adds newlines before every keyword href to make sure there's no more than one of it in a single line).
To extract links from multiple similar pages, for example all questions on the first 10 pages on this site, use a for loop.

for i in $(seq 1 10); do
wget -O - http://superuser.com/questions?page=$i | \
sed 's/href=/\nhref=/g' | \
grep -E 'href="http://superuser.com/questions/[0-9]+' | \
sed 's/.*href="//g;s/".*//g' >> out.txt
done

Remember to replace http://example.com/webpage.htm with your actual page URL and http://specify.com with the preceding string you want to specify.
You can specify not only a preceding string for the URL to export, but also a Regular Expression pattern if you use egrep or grep -E in the command given above.
If you're running a Windows, consider taking advantage of Cygwin. Don't forget to select packages Wget, grep, and sed.

iBug
  • 11,645
0

If you are okay with using Firefox for it, you can you the addon Snap Links Plus

  1. Hold down the right mouse button and drag a selection around the links.

  2. When they are highlighted, press and hold Control while letting go of the right mouse button.

Yisroel Tech
  • 13,220