Remove non-duplicate lines in Linux

Question

how can I remove non-duplicate lines from text file using any linux program linke sed, awk or any other?

Example:

abc
bbc
abc
bbc
ccc
bbc

Result:

abc
bbc
abc
bbc
bbc

Second list have removed ccc because it didn't have duplicate lines.

Is it also possible to remove lines, that are non-duplicate AND lines that have only 2 duplicates, and leave those who have more then 2 duplicates lines?

MariusMatutiae · Accepted Answer · 2016-10-21T20:00:03.843

The solutions posted by others do not work on my Debian Jessie: they keep a single copy of any duplicate line, while it is my understanding of the OP that all copies of the duplicate lines are to be kept. If I have understood the OP right, then ...

The following command
```
awk '!seen[$0]++' file
```
removes all duplicate lines.
The following command
```
awk 'seen[$0]++' file 
```
outputs all the duplicates, but not the original copy: i.e., if a line appears n times, it outputs the line n-1 times.
Then the command
```
awk 'seen[$0]++' file > temp && awk '!seen[$0]++' file >> temp
```
solves your problem. The lines are not in the original order.
If you want lines which have two or more duplicates, you can now iterate the above:
```
awk 'seen[$0]++' file | awk 'seen[$0]++' > temp
```
keeps n-2 copies of the lines which have n>1 duplicates. Now
```
awk '!seen[$0]++' temp > temp1 
```
removes all duplicate lines from the temp file, and you can now obtain what you wish (i.e. only the lines with n>1 duplicates) as follows:
```
cat temp1 >> temp; cat temp1 >> temp
```
If you need to do this for lines which appear N or more times, the following command
```
  awk 'seen[$0]++ && seen[$0] > N' file 
```
is simpler than chaining N times the command awk 'seen[$0]++' file.

score 7 · Answer 2 · answered Aug 02 '16 at 06:37

7

You can use sort & uniq commands for this.

If your data in abc.txt file, then;

cat abc.txt |sort|uniq -d

Out put will be;

abc 
bbc

answered Aug 02 '16 at 06:37

UUU

108

score 0 · Answer 3 · answered Mar 24 '21 at 01:36

The answer by @UUU doesn't keep sort order. To keep sort order, use the following instead:

 printf '%s\n' abc bbc abc bbc ccc bbc | \
     nl -nrz     | \
     sort -k2    | \
     uniq -f1 -D | \
     sort        | \
     cut -f2

The printf command simply reproduces the input.
nl command appends line numbers with leading zeros to allow for sort without the -V flag.
sort command sorts by the 2nd field. By default fields are separated by blanks.
uniq command identifies unique adjacent lines (which is why you have to sort first) and -f1 skips the first field which is the line number.
sort again to restore the original order.
cut to remove the leading line numbers.

Remove non-duplicate lines in Linux

3 Answers3