96

I've got a rather sizable CSV file (75MB). I'm just trying to produce a graph of it, so I really don't need all of the data.

Rewording: I'd like to delete n lines, then keep one line, then delete n lines, and so on.

So if the file looked like this:

Line 1
Line 2
Line 3
Line 4
Line 5
Line 6

and n=2, then the output would be:

Line 3
Line 6

It seems like sed might be able to do this, but I haven't been able to figure out how. A bash command would be ideal, but I'm open to any solution.

Jens Erat
  • 18,485
  • 14
  • 68
  • 80
Computerish
  • 1,063

7 Answers7

149
~ $ awk 'NR == 1 || NR % 3 == 0' yourfile
Line 1
Line 3
Line 6

NR (number of records) variable is records number of lines because default behavior is new line for RS (record seperator). pattern and action is optional in awk's default format 'pattern {actions}'. when we give only pattern part then awk writes all the fields $0 for our pattern's true conditions.

Selman Ulug
  • 1,616
67

sed can also do this:

$ sed -n '1p;0~3p' input.txt
Line 1
Line 3
Line 6

man sed explains ~ as:

first~step Match every step'th line starting with line first. For example, ``sed -n 1~2p'' will print all the odd-numbered lines in the input stream, and the address 2~5 will match every fifth line, starting with the second. first can be zero; in this case, sed operates as if it were equal to step. (This is an extension.)

kev
  • 13,200
25

Perl can do this too:

while (<>) {
    print  if $. % 3 == 1;
}

This program will print the first line of its input, and every third line afterwards.

To explain it a bit, <> is the line input operator, which iterates over the input lines when used in a while loop like this. The special variable $. contains the number of lines read so far, and % is the modulus operator.

This code can be written even more compactly as a one-liner, using the -n and -e switches:

perl -ne 'print if $. % 3 == 1'  < input.txt  > output.txt

The -e switch takes a piece of Perl code to execute as a command line parameter, while the -n switch implicitly wraps the code in a while loop like the one shown above.


Edit: To actually get lines 1, 3, 6, 9, ... as in the example, rather than lines 1, 4, 7, 10, ... as I first assumed you wanted, replace $. % 3 == 1 with $. == 1 or $. % 3 == 0.

7

If you want to do it with a Bash script you can try:

#!/bin/sh

echo Please enter the file name
read fname
echo Please enter the Nth lines that you want to keep
read n

exec<$fname
value=0
while read line
do
    if [ $(( $value % $n )) -eq 0 ] ; then
        echo -e "$line" >> new_file.txt
    fi
        let value=value+1 
done
echo "Check the 'new_file.txt' that has been created in this directory";

Save it as "read_lines.sh" and remember to give +x permissions to the bash file.

chmod +x ./read_lines.sh
slm
  • 10,859
5

GNU coreutils split can do this, e.g.:

seq 100 | split -n r/3/3

Explanation: split -n r/i/n will take each line k (counting from 0) where (k % n) + 1 = i.

If this is a CSV file and you need to keep the header, split -n r/1/3 will do.

Caesar
  • 195
  • 1
  • 5
5

A solution in pure bash, that does not spawn a process is:

{ for f in {1..2}; do read line; done;
  while read line; do
    echo $line;
    for f in {1..2}; do read line; done;
  done; } < file

The first line skip 2 lines at the beginning of file, and the while print the next line and skip 2 lines again.

If your file is small, this is a very efficient way of doing the job as it does not start a process. When your file is large, sed should be used as it is more efficient at handling io than bash.

jfg956
  • 1,189
1

A Python version (both Python 2 an Python 3):

python2 -c "print(''.join(open('file.txt').readlines()[::3]))"

replace [::3] with start, end and step size parameters for more control. E.g. [10:36:5] puts out lines 10,15,...,35.

Note, since readlines() keeps the line endings, the output of this call might end with an empty last line, unless the original last line gets put out by the chosen step size.

A stream version is possible, too (here output only after finished stream):

python -c "import sys;print(''.join(list(sys.stdin)[::3]))" < file.txt
MarianD
  • 2,726
DomTomCat
  • 111