This is more of a supplement/complement to @choroba's response above since he nailed it with "when you hear 'unique' think 'hash'". You should accept @choroba's answer :-)
Here I simplified the regex part of your question into a call to grep in order to focus on uniqueness, changed the data in your file a bit (so it could fit here) and saved it as dups.log:
# dups.log
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Succeeded.
This one-liner give the output below:
perl -E '++$seen{$_} for grep{/Warning/} <>; print %seen' dups.log
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
This is pretty much the same output you'd get with uniq log_with_dups.log | grep Warning. It works because perl creates a hash key from each line it reads on STDIN adding a key to the hash and incrementing its value (with ++$seen{$_}) each time it sees the key. For perl "same key" here means a line that is a duplicate. Try printing values %seen or using -MDDP and p %seen to get a sense of what is going on.
To get your output @choroba's regex adds the capture (instead of the whole line) to the hash:
perl -nE '/:?(\S+?)[:\s]+Warning/ && ++$seen{$1} }{ say for keys %seen' dups.log
but, just as with the whole line method above, the regex will create only one copy of the key (from the match and capture) and then increment it with
++ so in the you get "unique" keys à la
uniq in the
%seen hash.
It's a neat perl trick you never forget :-)
References:
- The SO question has some good explanations of the perl idiom for
uniq using a hash as per @choroba.
- This is touched on in perlfaq4 which describes the
%seen{} hash trick.
- Perlmaven shows how to make your own "home made"
uniq using this approach.
- ...