I would like pandas read_csv to properly read the following example text into a DataFrame:
"INDEX"|"COLUMN_STRING"|"COLUMN_INTEGER"|"COLUMN_EMPTY"|"COLUMN_EMPTY_STRING"
1|"string"|21||""
The file I need to parse has all the values that should be strings wraped with "".
Values that should be NaN are without double quotes, like that: ||
I would like read_csv to keep all the "quoted" values as strings, also "", but
it forces NaN as a default value for "".
If I use keep_default_na=False, it sets empty strings '' to both || and |""|.
Also, using dtype={"COLUMN_EMPTY_STRING": str} doesn't help.
Does anybody know the solution to this pickle?
Another possible solution, would be to use quoting=3. This would keep strings as "string", which could be solved after parsing. I cannot use it though, since I'm providing index_col argument, which raises error since it cannot find e.g. INDEX, because it reads "INDEX" from the file.