7

How can I remove duplicates in each line, for example here?

1 1 1 2 1 2 3
5 5 4 1 2 3 3

I'd like to get this output:

1 2 3 
5 4 1 2 3

There are lots of lines (100,000) and in each line I want unique values. Perl might be the fastest, but how can I do it in Perl or Bash?

slhck
  • 235,242
Arash
  • 736

3 Answers3

13

Here is an option using awk:

awk '{ while(++i<=NF) printf (!a[$i]++) ? $i FS : ""; i=split("",a); print ""}' infile > outfile

Edit Updated with comments:

  1. while (++i<=NF)

    Initializes the while loop, precrementing "i" since $0 is the full line in awk.

    So it starts at $1 (first field). Loops through the line until the end (less than or equal to 'NF' which is built into awk for "Number of Fields"). The default field separator is a space, you could change the default separator easily.

  2. printf (!a[$i]++) ? $i FS : ""

    This is a ternary operation.

    So, if input is not in the array !a[$i]++, then it prints $i, if it is, it prints "". (You could remove the ! and reverse the $i FS : "" if you don't like it this way).

  3. i=split("",a)

    Normally, that's a null split. In this case, it resets I for the next line.

  4. print ""

    ends the line for the output (not 100% why, actually), otherwise you would have an output of:

    1 2 3 5 4 1 2 3 instead of
    1 2 3
    5 4 1 2 3

nerdwaller
  • 18,014
5

Since ruby comes with any Linux distribution I know of:

ruby -e 'STDIN.readlines.each { |l| l.split(" ").uniq.each { |e| print "#{e} " }; print "\n" }' < test

Here, test is the file that contains the elements.

To explain what this command does—although Ruby can almost be read from left to right:

  • Read the input (which comes from < test through your shell)
  • Go through each line of the input
  • Split the line based on one space separating the items, into an array (split(" "))
  • Get the unique elements from this array (in-order)
  • For each unique element, print it, including a space (print "#{e} ")
  • Print a newline once we're done with the unique elements
slhck
  • 235,242
2

Not pure bash, but ...:

while read line; do
    printf "%s\n" $line | sort -u | tr '\n' ' '
    echo ''
done < file

The lines will be sorted as a byproduct.

glenn jackman
  • 27,524