When using pd.read_csv('myfile.csv', delimiter=';') on a csv which duplicated column names, pandas mangles the duplicated columns with .1, .2, .# (# is the number of the duplicated column)
My example csv looks like this:
| data1 | data2 | A | B | B | C | C |
|---|---|---|---|---|---|---|
| abc | NaN | text1 | text2 | text3 | text4 | text5 |
| def | 456 | text2 | text4 | text3 | text5 | text1 |
Data1;Data2;A;B;B;C;C
abc;;text1;text2;text3;text4;text5
def;456;text2;text4;text3;text5;text1
After import to dataframe, the duplicated columns get mangled:
This output is expected.
But I wish to combine these duplicated columns and their rows as comma-seperated strings.
So the desired output would look like: (order of columns is not important)
| data1 | data2 | A | B | C |
|---|---|---|---|---|
| abc | 123 | text1 | text2,text3 | text4,text5 |
| def | 456 | text2 | text4,text3 | text5,text1 |
How can I achieve that with pandas in python?
I found the following question when searching for the problem:
Concatenate cells into a string with separator pandas python
But I don't know how to apply the answer from that question to only those columns which are mangled.
