3

Here is a simple bash script for HTTP status code

while read url
    do
        urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
        echo "$url  $urlstatus" >> urlstatus.txt
    done < $1

I am reading URL from text file but it processes only one at a time, taking too much time, GNU parallel and xargs also process one line at time (tested)

How to process simultaneous URL for processing to improve timing? In other words threading of URL file rather than bash commands (which GNU parallel and xargs do)

as answer from user this code works fine except it don't process some last url

urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 ) && echo "$url  $urlstatus" >> urlstatus.txt &

may be adding wait help ,,, any suggestions

2 Answers2

3

In bash, you could use the & symbol to run programs in background. Example

for i in {1..100..1}; do
  echo $i>>numbers.txt &
done;

EDIT: Sorry but the answer for your question in the comment is wrong, so i just edited the answer. Suggestions wrt code

urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 ) && echo "$url  $urlstatus" >> urlstatus.txt &
me_alok
  • 382
2

GNU parallel and xargs also process one line at time (tested)

Can you give an example of this? If you use -j then you should be able to run much more than one process at a time.

I would write it like this:

doit() {
    url="$1"
    urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
    echo "$url  $urlstatus"
}
export -f doit
cat input.txt | parallel -j0 -k doit

Based on the input.txt:

Input file is txt file and lines are separated  as
ABC.Com
Bcd.Com
Any.Google.Com
Something  like this
www.google.com
pi.dk

I get the output:

Input file is txt file and lines are separated  as  000
ABC.Com  301
Bcd.Com  301
Any.Google.Com  000
Something  like this  000
www.google.com  302
pi.dk  200

Which looks about right:

000 if domain does not exist
301/302 for redirection
200 for success

I must say I am a bit surprised if the input lines you have provided really are parts of the input you actually use. None of these domains exist, and domain names with spaces in probably never will exist - ever:

Input file is txt file and lines are separated  as
Any.Google.Com
Something  like this

If you have not given input from your actual input file, you really should do that instead of making up stuff - especially if the made up stuff does not resemble the real data.

Edit

Debugging why it does not work for you.

Please do not write a script, but run this directly in the terminal:

bash # press enter here to make sure you are running this in bash
doit() {
    url="$1"
    urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
    echo "$url  $urlstatus"
}
export -f doit
echo pi.dk | parallel -j0 -k doit

This should give:

pi.dk  200
Ole Tange
  • 5,099