0

I've got a dataset that goes like this:

AAAAA 11111 Data1
AAAAA 11111 Data2
AAAAA 11111 Data3
AAAAA 11112 Data4
AAAAA 11112 Data5
AAAAA 11112 Data6
AAAAA 11112 Data7
AAAAA 11113 Data8
AAAAA 11114 Data9

And so on. I want to filter according to the 2nd field and then run a uniq to only pull the FIRST entry. In this case, I want the output to be:

AAAAA 11111 Data1
AAAAA 11112 Data4
AAAAA 11113 Data8
AAAAA 11114 Data9

This seems like it would be pretty easy, but the method is just slipping me. Any help?

choroba
  • 20,299
Fyyz
  • 13

3 Answers3

1

You can use sort to do the work:

sort -k2,2 -u

-k2,2 means operate only on the 2nd column, -u means unique.

choroba
  • 20,299
0

There's an idiomatic piece of awk to do it:

awk '!seen[$2]++' file

print out the line only the first time the value in the 2nd column has been seen

glenn jackman
  • 27,524
0

You can use the below command to sort it out

sort new.txt | rev | uniq -s 6 | rev

output of the file is as follows

enter image description here

Hope this helps

BDRSuite
  • 6,378