I have a list of ids in a file and a data file (of ~3.2Gb in size), and I want to extract the lines in the data file that contain the id and also the next line. I did the following:
grep -A1 -Ff file.ids file.data | grep -v "^-" > output.data
This worked, but also extracted unwanted substrings, for example if the id is EA4 it also pulled out the lines with EA40.
So I tried using the same command but adding the -w (--word-regexp) flag to the first grep to match whole words. However, I found my command now ran for >1 hour (rather than ~26 seconds) and also started using 10s of gigabytes of memory, so I had to kill the job.
Why did adding -w make the command so slow and memory grabbing? How can I efficiently run this command to get my desired output? Thank you
file.ids looks likes this:
>EA4
>EA9
file.data looks like this:
>EA4 text
data
>E40 blah
more_data
>EA9 text_again
data_here
output.data would look like this:
>EA4 text
data
>EA9 text_again
data_here