4

I've got a couple of text files (a.txt and b.txt) containing a bunch of URLs, each on a separate line. Think of these files as blacklists. I want to sanitize my c.txt file, scrubbing it of any of the strings in a.txt and b.txt. My approach is to rename c.txt to c_old.txt, and then build a new c.txt by grepping out the strings in a.txt and b.txt.

type c_old.txt | grep -f a.txt -v | grep -f b.txt -v > c.txt

For a long while, it seemed like my system was working just fine. However, lately, I've lost nearly everything that was in c.txt, and new additions are being removed despite not occurring in a.txt or b.txt. I have no idea why.

P.S. I'm on Windows 7, so grep has been installed separately. I'd appreciate it if there are solutions that don't require me to install additional Linux tools.


Update: I've discovered one mistake in my batch file. I used ren c.txt c_old.txt without realising that ren refuses to overwrite the target file if it exists. Thus, the type c_old.txt | ... always used the same data. This explains why new additions to c.txt were being wiped out, but it does not explain why so many entries that were in c.txt have gone missing.

gibson
  • 177

1 Answers1

0

Well, I don't really have much data to go on, since there's not a huge number of new additions to a.txt and b.txt since I originally asked the question, but since fixing the ren issue (replaced it with move /Y), things have been working smoothly.

So, things are working better. I'm still not sure how the initial data loss happened, but it may be that I messed up at some point when editing the scripts, and didn't do my test runs in a safe environment.

gibson
  • 177