I'm trying to print duplicate lines from the filehandle, not remove them or anything else I see asked on other questions. I don't have enough experience with perl to be able to quickly do this, so I'm asking here. What's the way to do this?
- 
                    2A lot depends on the size of input, sizes of lines and the potential number of duplicates. If the memory requirements are low, then the solutions with a `%duplicates` hash are adequate. – Sinan Ünür May 04 '11 at 13:57
 - 
                    They are. I'm just using the filehandle to quickly check something. It doesn't look like there are any duplicates, so that's good. – Chris May 04 '11 at 14:00
 
4 Answers
Using the standard Perl shorthands:
my %seen;
while ( <> ) { 
    print if $seen{$_}++;
}
As a "one-liner":
perl -ne 'print if $seen{$_}++'
More data? This prints <file name>:<line number>:<line>:
perl -ne 'print ( $ARGV eq "-" ? "" : "$ARGV:" ), "$.:$_" if $seen{$_}++'
Explanation of %seen: 
%seendeclares a hash. For each unique line in the input (which is coming fromwhile(<>)in this case)$seen{$_}will have a scalar slot in the hash named by the the text of the line (this is what$_is doing in the has{}braces).- Using the postfix increment operator (
x++) we take the value for our expression, remembering to increment it after the expression. So, if we haven't "seen" the line$seen{$_}is undefined--but when forced into an numeric "context" like this, it's taken as 0--and false. - Then it's incremented to 1.
 
So, when the while begins to run, all lines are "zero" (if it helps you can think of the lines as "not %seen") then, the first time we see a line, perl takes the undefined value - which fails the if - and increments the count at the scalar slot to 1. Thus, it is 1 for any future occurrences at which point it passes the if condition and it printed. 
Now as I said above, %seen declares a hash, but with strict turned off, any variable expression can be created on the spot. So the first time perl sees $seen{$_} it knows that I'm looking for %seen, it doesn't have it, so it creates it. 
An added neat thing about this is that at the end, if you care to use it, you have a count of how many times each line was repeated.
- 
                    Can you explain how $seen{$_}++ works exactly? I get that it's assigning the current line's value to a hash table, but what is the ++ doing here that makes it find duplicates? – Chris May 04 '11 at 14:08
 - 
                    1$seen{$_} refers to a value in the hash %seen, with the key $_, which is the current line. The ++ operator will increment the hash value. This means, the first time a key appears, its value will be false, and the print will not happen. The subsequent times it is seen, it will be >0, and so the print will execute, and print without args by default prints the $_ variable. – TLP May 04 '11 at 14:23
 - 
                    Ah, so the key for the hash is the line, but the value is the number of times it was found in the file -1. – Chris May 04 '11 at 14:56
 - 
                    1
 
try this
#!/usr/bin/perl -w
use strict;
use warnings;
my %duplicates;
while (<DATA>) {
    print if !defined $duplicates{$_};
    $duplicates{$_}++;
}
- 17,469
 - 22
 - 83
 - 129
 
- 
                    I'd do `print unless exists $duplicates{$_}`. And +1 for `-w`, `use strict` and `use warnings`. – Blrfl May 04 '11 at 19:50
 
- 
                    1This is like `sort file.txt | uniq -d` (print only duplicates) in a typical Unix shell. Is there a simple equivalent of `sort file.txt | uniq -u` (print only unique lines)? – G. Cito Jul 15 '13 at 21:07