hope you are having a good day.
I have an issue with my code, I have to merge different datasets that I in a list called all_csv. The thing is that the datasets are something like this:
Dataset 1
index   device1
GR03    1110
GR20    1121
*GR08*  1109
Dataset 2
index   device2   
*GR08*  1112
GR01    1114
GR04    1123
(In the code the index is a column called list1)
As you can see, there are sometimes that the index may be the same (GR08). So I made this:
all_files = glob.glob(path + "/*data.csv") # This selects all the .csv files in the path
all_csv = [pd.read_csv(f, sep=',') for f in all_files]
all_csv = [df.set_index(df["list1"]) for df in all_csv] #list1 is the index shown above
all_csv = [df.drop(df.columns[0], axis=1) for df in all_csv] #delete the duplication
df_merged   = pd.concat(all_csv, axis=1) # Here is the error "Reindexing only valid with uniquely valued Index objects"
I understand that to use concat there must be all the same index values, so I tried before to use it without setting the index but I got something like this:
all_files = glob.glob(path + "/*data.csv") 
all_csv = [pd.read_csv(f, sep=',') for f in all_files]
df_merged   = pd.concat(all_csv, axis=1)
Resulting dataset
index  device1  device2  device3
nan    1110     nan      1092
nan    1121     nan      nan
*GR08* 1109     1112     1098
nan    nan      1114     nan
nan    nan      1123     1111
This is correct, but I don't know why is the index values that do not connect in all the datasets appear with a nan.
Has anyone an idea how can I solve this? or an different strategy to solve it.
Thanks for all your answers.
 
    