Easiest way to remove unwanted lines from a huge text file

Question

I have a large text file with a size of more than 30 megabytes. I want to remove all the lines which don't match some specific criteria, e.g. lines that don't have the string 'START'.

What's the easiest way to do this?

score 4 · Answer 1 · answered Jun 03 '10 at 02:02

If the pattern is really that simple, grep -v will work:

grep -v START bigfile.txt > newfile.txt

newfile.txt will have everything from bigfile.txt except lines with "START".

(In case it isn't obvious, this is something you'll do in Terminal or other command line tool)

score 2 · Answer 2 · edited Jul 09 '11 at 12:06

2

The original question asked how to remove the lines that didn't match a pattern. In other words, how to keep the lines that do match the pattern. Thus, no need for -v.

grep START infile.txt > outfile.txt

Note that grep can use regular expressions to do much more powerful pattern matching. The syntax is a bit obtuse though.

edited Jul 09 '11 at 12:06

Gareth

19,080

answered Jun 03 '10 at 02:49

whatever

21

score 1 · Answer 3 · answered Jun 03 '10 at 01:51

1

Use GNU sed with the -i argument.

answered Jun 03 '10 at 01:51

Ignacio Vazquez-Abrams

114,604

score 1 · Answer 4 · answered Jun 03 '10 at 02:02

1

grep -v START inputfile

should work. grep is standard on both MacOS and Linux/Unix, can be installed on MS Windows.

Option -v is for inverting the match - only output lines that do not contain the pattern (the inverse of the usual grep behaviour).

answered Jun 03 '10 at 02:02

sleske

23,525

Mike Fitzpatrick · Answer 5 · 2010-06-03T06:58:00.703

1

For Windows Command Prompt (help find for options):

find /v "START" original_file.txt > new_file.txt

For Linux, OS X, etc. (man grep for options):

grep -v "START" original_file.txt > new_file.txt

For more complicated text matching grep offers a lot more functionality than find. If you are on Windows you can easily find a port of grep or you can use Windows' findstr instead of find.

edited Jun 03 '10 at 06:58

answered Jun 03 '10 at 02:03

Mike Fitzpatrick

17,427

Easiest way to remove unwanted lines from a huge text file

5 Answers5