Filtering 2nd field from a data set and then using uniq on the output

Question

I've got a dataset that goes like this:

AAAAA 11111 Data1
AAAAA 11111 Data2
AAAAA 11111 Data3
AAAAA 11112 Data4
AAAAA 11112 Data5
AAAAA 11112 Data6
AAAAA 11112 Data7
AAAAA 11113 Data8
AAAAA 11114 Data9

And so on. I want to filter according to the 2nd field and then run a uniq to only pull the FIRST entry. In this case, I want the output to be:

AAAAA 11111 Data1
AAAAA 11112 Data4
AAAAA 11113 Data8
AAAAA 11114 Data9

This seems like it would be pretty easy, but the method is just slipping me. Any help?

score 1 · Accepted Answer · answered Jan 04 '15 at 22:26

1

You can use sort to do the work:

sort -k2,2 -u

-k2,2 means operate only on the 2nd column, -u means unique.

answered Jan 04 '15 at 22:26

choroba

20,299

score 0 · Answer 2 · answered Jan 04 '15 at 22:37

0

There's an idiomatic piece of awk to do it:

awk '!seen[$2]++' file

print out the line only the first time the value in the 2nd column has been seen

answered Jan 04 '15 at 22:37

glenn jackman

27,524

score 0 · Answer 3 · answered Jan 04 '15 at 23:03

0

You can use the below command to sort it out

sort new.txt | rev | uniq -s 6 | rev

output of the file is as follows

enter image description here

Hope this helps

answered Jan 04 '15 at 23:03

BDRSuite

6,378

Filtering 2nd field from a data set and then using uniq on the output

3 Answers3