3

I have a CSV file and want to discard a couple of columns. So let's say this is a sample file:

column a, column b, column c
value  a, value  b, value  c
value  a, "quoted, b", value c

And now let's say we would want to discard column b, so that the result is:

column a, column c
value  a, value  c
value  a, value c

If there were not the quoted string "quoated, b" I could do this with cut:

cut -d ',' -f 1,3

However there is this quoted string. I could just load the file with libreoffice, but besides of being less cool and automatable my files are several hundered MB and some even exceed the maximum number of rows for LibreOffice Calc.

(Side note: My actual files have more like 30 columns and I'd like to select about 5-10 columns of those. So it is not like "discard the last column")

yankee
  • 693

1 Answers1

3

If you can install python, and easy_install, then you can also install csvkit: https://csvkit.readthedocs.io

And, you can now run a simple command like the following to select only columns 1 and 3:

csvcut -c 1,3 original_file.csv > new_file.csv

Or, another example, to REMOVE the second column:

csvcut -C 2 original_file.csv > new_file.csv

..

NOTE:

Just a word of warning, your CSV looks invalid. Unless you want to have a space character in your data, you MUST NOT have a space after the comma/delimiter. A space character will just be part of the data, and could mess up with quoted text.

How is this data generated? Can it be generated without the extra spaces? E.g. column a,column b,column c

jehad
  • 1,594