2

I have a large text file with a size of more than 30 megabytes. I want to remove all the lines which don't match some specific criteria, e.g. lines that don't have the string 'START'.

What's the easiest way to do this?

Gareth
  • 19,080
Shawn
  • 437

5 Answers5

4

If the pattern is really that simple, grep -v will work:

grep -v START bigfile.txt > newfile.txt

newfile.txt will have everything from bigfile.txt except lines with "START".

(In case it isn't obvious, this is something you'll do in Terminal or other command line tool)

Doug Harris
  • 28,397
2

The original question asked how to remove the lines that didn't match a pattern. In other words, how to keep the lines that do match the pattern. Thus, no need for -v.

grep START infile.txt > outfile.txt

Note that grep can use regular expressions to do much more powerful pattern matching. The syntax is a bit obtuse though.

Gareth
  • 19,080
1

Use GNU sed with the -i argument.

1
grep -v START inputfile

should work. grep is standard on both MacOS and Linux/Unix, can be installed on MS Windows.

Option -v is for inverting the match - only output lines that do not contain the pattern (the inverse of the usual grep behaviour).

sleske
  • 23,525
1

For Windows Command Prompt (help find for options):

find /v "START" original_file.txt > new_file.txt

For Linux, OS X, etc. (man grep for options):

grep -v "START" original_file.txt > new_file.txt

For more complicated text matching grep offers a lot more functionality than find. If you are on Windows you can easily find a port of grep or you can use Windows' findstr instead of find.