I have a tabular file with 6 columns. What I need to do, is to add a 7th column that counts the occurrence of the value from the column 3. I did it with Excel, adding the formula
=countif(C:C,$C1)
But the files are huge, and I have lots of them
For example:
My input is this one:
0   SL3.0ch03   7675648 21M GATCACTCCAAACTCATCATA   NM:i:2
0   SL3.0ch03   7675648 21M GATCACTCCAAACTCATCATA   NM:i:2
0   SL3.0ch03   7675648 21M GATCACTCCAAACTCATCATA   NM:i:2
0   SL3.0ch03   7675649 21M ATCACTCCAAACTCATCATAC   NM:i:1
0   SL3.0ch03   7675649 21M ATCACTCCAAACTCATCATAC   NM:i:1
0   SL3.0ch03   7675649 21M CTCACTCCAAACTCATCATAC   NM:i:2
0   SL3.0ch03   7675649 21M ATCACTCCAAACTCATCATAC   NM:i:1
0   SL3.0ch03   7675649 21M ATCACTCCAAACTCATCATAC   NM:i:1
0   SL3.0ch03   7675650 21M TCACTCCAAACTCATCATACT   NM:i:1
0   SL3.0ch03   7675650 21M TCACTCCAAACTCATCATACT   NM:i:1
0   SL3.0ch03   7675650 21M TCACTCCAAACTCATCATACT   NM:i:1
0   SL3.0ch03   7675650 21M TCACTCCAAACTCATCATACT   NM:i:1
And I need an output like this one:
0   SL3.0ch03   7675648 21M GATCACTCCAAACTCATCATA   NM:i:2  3
0   SL3.0ch03   7675648 21M GATCACTCCAAACTCATCATA   NM:i:2  3
0   SL3.0ch03   7675648 21M GATCACTCCAAACTCATCATA   NM:i:2  3
0   SL3.0ch03   7675649 21M ATCACTCCAAACTCATCATAC   NM:i:1  5
0   SL3.0ch03   7675649 21M ATCACTCCAAACTCATCATAC   NM:i:1  5
0   SL3.0ch03   7675649 21M CTCACTCCAAACTCATCATAC   NM:i:2  5
0   SL3.0ch03   7675649 21M ATCACTCCAAACTCATCATAC   NM:i:1  5
0   SL3.0ch03   7675649 21M ATCACTCCAAACTCATCATAC   NM:i:1  5
0   SL3.0ch03   7675650 21M TCACTCCAAACTCATCATACT   NM:i:1  4
0   SL3.0ch03   7675650 21M TCACTCCAAACTCATCATACT   NM:i:1  4
0   SL3.0ch03   7675650 21M TCACTCCAAACTCATCATACT   NM:i:1  4
0   SL3.0ch03   7675650 21M TCACTCCAAACTCATCATACT   NM:i:1  4
I've tried a few things that I found:
awk '{h[$3]++}; END { for(k in h) print k, h[k] }' input.tab
That actually displays the 7th column, but not the rest. I also found that this code:
awk '{print $1,$2,$3,$4,$5,$6}'
prints all the columns, so I thought "this should work":
awk '{print $1,$2,$3,$4,$5,$6,$7};{h[$3]++}; END { for(k in h) print k, h[k] }' input.tab > output.tab
but it obviously didn't. The best thing I could achieve was to print all 6 original columns and the output I need at the bottom of the file, but I need it as a 7th column.
I'm familiar with basic shell commands, but not with AWK language.