I have a list of names and IDs (50 entries)
cat input.txt
name    ID
Mike    2000
Mike    20003
Mike    20002
And there is a huge zipped file (13GB)
zcat clients.gz
name    ID  comment
Mike    2000    foo
Mike    20002   bar
Josh    2000    cake
Josh    20002   _
My expected output is
NR  name    ID  comment
1    Mike   2000    foo
3    Mike   20002   bar
each $1"\t"$2 of clients.gz is a unique identifier. There might be some entries from input.txt that might be missing from clients.gz. Thus, I would like to add the NR column to my output to find out which are missing. I would like to use zgrep. awk takes a very long time (since I had to zcat for uncompress the zipped file I assume?)
I know that zgrep 'Mike\t2000' does not work. The NR issue I can fix with awk FNR I imagine.
So far I have:
awk -v q="'" 
'
NR > 1 {
print "zcat clients.gz | zgrep -w $" q$0q
}' input.txt |
bash > subset.txt
 
     
     
    