1

I've got a system which needs to get the latest 200 lines from a very large public file every day. The file is exposed over an url. Currently I run a simple script which does a wget and then tails the last 200 lines into a different file, after which the original file is deleted again.

Because the original file is very large (about 250MB) most of the time the script runs is taken up by downloading the file.

My system works fine, but it's just annoying that it takes so long, also because I am often just waiting for it.

I found suggestions such as this one, but that basically does the same as I do now; downloading the entire file and tailing it.

Does anybody know a way that I can tail the public file without downloading it entirely? All tips are welcome!

kramer65
  • 1,442
  • 4
  • 26
  • 43

2 Answers2

0

If the server where the file is stored supports continued downloading then you can start the download from any offset using the --start-pos option of wget.

You need to get the file size (using something like curl -I), calculate a rough estimate of the last 200 lines and use the difference as the starting offset.

efotinis
  • 4,370
0

If you use the -c|--continue option, wget will just download the missing part and add it to your existing copy:

-c
--continue
    Continue getting a partially-downloaded file. This is useful when you want to finish up 
    a download started by a previous instance of Wget, or by another program. For instance:

    wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z

    If there is a file named ls-lR.Z in the current directory, Wget will assume that it
    is the first portion of the remote file, and will ask the server to continue the 
    retrieval from an offset equal to the length of the local file. 

Not that this requires the server to support the "Range" option of HTTP, exactly like the --start-pos option in @efotinis' answer. This is called byte-serving.

xenoid
  • 10,597