There is a plain (html) web page, which occasionally gets updated with text information. Is there any way to get notified (by any means) if this particular web page content is changed whereas website doesn't provide any subscription mechanism?
Asked
Active
Viewed 251 times
1 Answers
1
Linux (and WSL)
You should be able to use the following even in the Windows Subsystem for Linux (WSL)
Under Linux you can use wget with the -N option (timestamp) from the directory in which you have previously downloaded the page.
wget -N https://example.com/your_page.html
It will download only a new version of the file, asking to the server the timestamp. wget will give a different output in case the download succeed or not.
File downloaded
HTTP request sent, awaiting response... 200 OK
Not downloaded because it has the same timestamp
HTTP request sent, awaiting response... 304 Not Modified
So at the end you can build your command-line or a script similar to
wget -N https://example.com/your_page.html 2>&1 \
| grep "304 Not Modified" >/dev/null \
&& echo "Not downloaded, old one" \
|| echo "There is a new file"
Notes:
-Nactivate the timestamp usage2>&1redirect the standard error to the standard output, needed for the followinggrep| grep "304 Not Modified"pipe the output of the previous command (wget) as input ofgrepthat selects the line with "304 Not Modified". It exit with status0(true) if it founds a match.&&logic and (akathen) execute the following command if the previous exit status is 0. You can omit this part.||logic or (akaelse) execute the following command if the exit status is not 0.- each
\is used in the shell to go to a new line and it must be the last character of the line - change
echo this or thatwith your command...
Final remarks
With wget -O - | md5sum you can create a checksum to reuse the next time without the need to keep a local copy...
Hastur
- 19,483
- 9
- 55
- 99