I have 4 csv files. Each file has different fields, e.g. name, id_number, etc. Each file is talking about the same thing, for which there is a unique id that each file has. So, I would like to concatenate the fields of each of the 4 files into a single DataFrame. For instance, one file contains first_name, another file contains last_name, then I want to merge those two, so that I can have first and last name for each object.
Doing that is trivial, but I'd like to know the most efficient way, or if there is some built-in function that does it very efficiently.
The files look something like this:
file1:
id   name   age pets
b13  Marge  18  cat
y47  Dan    13  dog
h78  Mark   20  lizard
file2:
id   last_name   income  city
y47  Schmidt     1800    Dallas
b13  Olson       1670    Paris
h78  Diaz        2010    London
file 3 and 4 are like that with different fields. The ids are not necessarily ordered. The goal again, is to have one DataFrame looking like this:
id   name   age pets    last_name income city
b13  Marge  18  cat     Olson     1670   Paris
y47  Dan    13  dog     Schmidt   1800   Dallas
h78  Mark   20  lizard  Diaz      2010   London
What I've done is this:
file1 = pd.read_csv('file1.csv')
file2 = pd.read_csv('file2.csv')
file3 = pd.read_csv('file3.csv')
file4 = pd.read_csv('file4.csv')
f1_group = file1.groupby(['id'])
f2_group = file2.groupby(['id'])
f3_group = file3.groupby(['id'])
f4_group = file4.groupby(['id'])
data = []
for id1, group1 in f1_group:
    for id2, group2 in f2_group:
        for id3, group3 in f3_group:
            for id4, group4 in f4_group:
                if id1 == id2 == id3 == id4:
                   frames = [group1, group2, group3, group4]
                   con = pd.concat(frames, axis=1)
                   data.append(con)  
That works but is extremely inefficient. If I could eliminate the element that has been already considered from group1, group2, etc, that would help, but it would still be inefficient.
Thanks in advance.
 
    