50

While trying to find out how to get CSVs to open correctly in Excel on a Spanish PC (belonging to a customer) I found many answers recommending using sep=, at the top of the file:

On the face of it, this approach works but none of those answers give any further information on where this metadata option comes from. I have tried to search for what it means (and hence found all of those answers) but have been unable to get any further information especially given that punctuation is notoriously hard to search for.

My primary concern is whether or not this is an Excel-specific feature. I suspect it is but have found nothing definitive to confirm it.

Some related questions are:

  • What characters can be used in this setting?
  • What other settings are available (eg. line termination character, quote character, etc).
  • Are there any other tools that officially support this feature?

I'm hoping there is a piece of documentation somewhere that someone can point me to that will answer all these questions and more; I just haven't been able to find it.

Some clarification:

The sep= is not a parameter to a parser. It is meant to be placed inside the CSV. Example:

sep=|
"LETTER"|"ANIMAL"
"a"|"aardvark"
"b"|"bear"
"c"|"cow"

5 Answers5

20

RFC 4180 is commonly recognized as the standard for the CSV format, and does not mention any such feature.

W3C Model for Tabular Data and Metadata on the Web mentions it as a feature not within their scope:

Many "CSV" files embed metadata, for example in lines before the header row of the CSV document. This specification does not define any formats for embedding metadata within CSV files, aside from the names of columns in the header row.

So while I did not find any direct proof that this feature is Excel-specific, this apparently still is the case as there seems to be no standardization document covering that feature.

4

The instruction sep= is used in all the countries where comma is used as decimal delimiter, as Italy. Since Excel export CSV using semicolons instead of commas in those countries, if you want to be sure that the file can be read in USA or other countries, you NEED

sep=;

on top of file. This is Excel-specific and it might be ignored by other applications.

0

To contribute towards an answer to one of your questions: "Are there any other tools that officially support this feature?"

This is not supported by Apple Numbers (I've tested using Numbers version 3.6.2).

It's also not supported by csv2json (https://www.npmjs.com/package/csv2json) nor csvtojson (https://www.npmjs.com/package/csvtojson).

kintel
  • 121
-1

I tested this as well on Google Sheet and Google does not support it for the moment. But Google Sheet does not process CSV files correctly if a ";" is used as separator.

I personal think it is much easier for tool vendors to analyse the CSV file and identify the separator automatically. I did not find a tool yet which do this :(

-4

Just about any tool that can import from .csv files recognises that comma is not the only possible separator, tabs have been common for a long time for example. I think this goes back to the days of punch cards even, that there needs to be some character that is recognised as end of data field - with the exception of systems that use fixed width fields or length prefixed fields just about every system has need this since the advent of computers.

The 3 essentials are a field separator, a record separator and an end of data marker, csv by default uses comma, newline and end of file for these respectively but just about any valid input can be used, you do, however, need a mechanism for escaping the special markers if they occur within your valid data.

One historic system I came across used newline for end of field, 2 blank lines for end of record and *!*!* END *!*!* for the end of the data. I have commonly come across all of ,\t|: used as a field separator along with various control characters.

One thing to be wary of is that the French and others use comma as a decimal point - this can cause all sorts of fun and games. There is a degree of standardisation for csv files in [rfc4180][1] but there are times when you need to manually edit files when exporting from one program and importing to another.

To answer the second part of your question Python, for example, includes a [csv reader/writer][2] as one of the standard libraries that offers the following options:

  • delimiter - This is the field separator, (unless escaped).
  • doublequote - If true then a " within a field will be represented as ""
  • escapechar - removes any special meaning from any following character, e.g. if it is set to \ then quote can be represented as \"
  • lineterminator - usually one of \n, \r, \r\n or \n\r
  • quotechar - character used to quote strings, usually " or '
  • quoting - flag as one of:
  • ALL - Every field is in quotes, i.e.: "1","Free Beer","ASAP"
  • MINIMAL - Only quote fields that have special characters, i.e.: 1,"Beer, (Free)", Now, Please
  • QUOTE_NONNUMERIC - Quote anything that is not a number, i.e.: 1,"Beer, (Free)", "Now", "Please"
  • NONE - Don't quote escape instead, i.e.: 1,Beer\, (Free), Now, Please
  • skipinitialspace - skip any leading space of each field so you can use , rather than just , as a field separator.

Generally the more flexible and well thought out a tool that can import csv files is the more of these options, (possibly with differing names and/or mechanisms), that it will have. [1]: https://www.rfc-editor.org/rfc/rfc4180 [2]: https://docs.python.org/2/library/csv.html