0

From this page, a radio show http://www.ellinofreneianet.gr/sounds.php?s=0&p=10&o=l I want to download all the recorded shows.

They are all this type of pages http://www.ellinofreneianet.gr/sound.php?id=7101
and I want to grab from all these 7 thousand pages the line 422 of the source code where the download link is located.
It can be achieved by not line grabbing too, regular expression ".=podcast/." works too.

How to grab the line 422 of every page of that type OR get the "=podcast/****.mp3" part using shell scripts/commands?

NoName
  • 11

1 Answers1

0

Something like this?

for i in {7101..7200} ; do  wget -q -O - http://www.ellinofreneianet.gr/sound.php\?id\=$i | grep ".=podcast/." ; done

The wget options are -q quiet, show no progress etc, and -O - write output to stdout.

Not every page has a mp3 link there; Some even ones show a page which could be the 404 error page. The pages starting from 0 also seem empty.

The empty pages have URLs ending in podcast/", so we can exclude them with matching strings which don't have a " there:

... | grep ".=podcast/[^\"]"

To get only the .mp3 urls, use

... | grep -o 'bitsnbytesplayer.php.*\.mp3'

You found out yourself how to output the page URL before each mp3 URL. Here's an optimiset variant of that, using only one HTTP request per page:

for i in {7100..7200} ; do \
    wget -q -O - http://www.ellinofreneianet.gr/sound.php\?id\=$i | \
    grep -o 'bitsnbytesplayer.php.*\.mp3' && \
    echo http://www.ellinofreneianet.gr/sound.php\?id\=$i ; done | sed -n 'h;n;p;g;p'

The && echo ... prints the URL if the grep before found an mp3 url. The sed command switches the order of the line pairs.