Remove duplicates in each line of a file

Question

How can I remove duplicates in each line, for example here?

1 1 1 2 1 2 3
5 5 4 1 2 3 3

I'd like to get this output:

1 2 3 
5 4 1 2 3

There are lots of lines (100,000) and in each line I want unique values. Perl might be the fastest, but how can I do it in Perl or Bash?

nerdwaller · Accepted Answer · 2012-12-19T19:24:41.953

Here is an option using awk:

awk '{ while(++i<=NF) printf (!a[$i]++) ? $i FS : ""; i=split("",a); print ""}' infile > outfile

Edit Updated with comments:

while (++i<=NF)

Initializes the while loop, precrementing "i" since $0 is the full line in awk.

So it starts at $1 (first field). Loops through the line until the end (less than or equal to 'NF' which is built into awk for "Number of Fields"). The default field separator is a space, you could change the default separator easily.
printf (!a[$i]++) ? $i FS : ""

This is a ternary operation.

So, if input is not in the array !a[$i]++, then it prints $i, if it is, it prints "". (You could remove the ! and reverse the $i FS : "" if you don't like it this way).
i=split("",a)

Normally, that's a null split. In this case, it resets I for the next line.
print ""

ends the line for the output (not 100% why, actually), otherwise you would have an output of:

1 2 3 5 4 1 2 3 instead of
1 2 3
5 4 1 2 3

slhck · Answer 2 · 2012-12-19T18:15:42.867

Since ruby comes with any Linux distribution I know of:

ruby -e 'STDIN.readlines.each { |l| l.split(" ").uniq.each { |e| print "#{e} " }; print "\n" }' < test

Here, test is the file that contains the elements.

To explain what this command does—although Ruby can almost be read from left to right:

Read the input (which comes from < test through your shell)
Go through each line of the input
Split the line based on one space separating the items, into an array (split(" "))
Get the unique elements from this array (in-order)
For each unique element, print it, including a space (print "#{e} ")
Print a newline once we're done with the unique elements

score 2 · Answer 3 · answered Dec 19 '12 at 21:03

2

Not pure bash, but ...:

while read line; do
    printf "%s\n" $line | sort -u | tr '\n' ' '
    echo ''
done < file

The lines will be sorted as a byproduct.

answered Dec 19 '12 at 21:03

glenn jackman

3 Answers3