I'm currently parsing apache logs with that command:
tail -f /opt/apache/logs/access/gvh-access_log.1365638400 |
grep specific.stuff. | awk '{print $12}' | cut -d/ -f3 > ~/logs
The output is a list of domains:
www.domain1.com
www.domain1.com
www.domain2.com
www.domain3.com
www.domain1.com
In another terminal I then run this command:
watch -n 10 'cat ~/logs | sort | uniq -c | sort -n | tail -50'
The output is:
1023 www.domain2.com
2001 www.domain3.com
12393 www.domain1.com
I use this to monitor in quasi real time apache stats. The trouble is that logs get very big very fast. I don't need logs for any other purpose than uniq -c.
My question is: is there any way to avoid using a temporary file? I don't want to hand-roll my own counter in my language of choice, I'd like to use some awk magic if possible.
Note that since I need to use sort, I have to use a temp file in the process, because sorting on streams is meaningless (although uniq isn't).