I want to merge two dataframes, however, the right dataframe has to be processed in chunks due to size. From the second iteration (i.e. when merging chunk2 into df) merge creates additional columns (see MWE), however, I want to merge into the old columns.
Please note that the (date) integer in column A is not a unique index in df.
import pandas as pd
df = pd.DataFrame({'A': [20170801, 20170801, 20170802, 20170901],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                'C': ['C0', 'C1', 'C2', 'C3'],
                'D': ['D0', 'D1', 'D2', 'D3']},
                index=[0, 1, 2, 3])
chunk1 = pd.DataFrame({'A': [20170801, 20170802, 4, 4],
                'E': ['B4', 'B5', 'B6', 'B7'],
                'F': ['C4', 'C5', 'C6', 'C7'],
                'G': ['D4', 'D5', 'D6', 'D7']},
                 index=[0, 1, 2, 3])
chunk2 = pd.DataFrame({'A': [20170901, 67, 68, 69],
                'E': ['B4', 'B5', 'B6', 'B7'],
                'F': ['C4', 'C5', 'C6', 'C7'],
                'G': ['D4', 'D5', 'D6', 'D7']},
                 index=[0, 1, 2, 3])
df = df.merge(chunk1, on='A', how='left')
print(df)
      A   B   C   D    E    F    G
0  20170801  B0  C0  D0   B4   C4   D4
1  20170801  B1  C1  D1   B4   C4   D4
2  20170802  B2  C2  D2   B5   C5   D5
3  20170901  B3  C3  D3  NaN  NaN  NaN
df = df.merge(chunk2, on='A', how='left')
print(df)
          A   B   C   D  E_x  F_x  G_x  E_y  F_y  G_y
0  20170801  B0  C0  D0   B4   C4   D4  NaN  NaN  NaN
1  20170801  B1  C1  D1   B4   C4   D4  NaN  NaN  NaN
2  20170802  B2  C2  D2   B5   C5   D5  NaN  NaN  NaN
3  20170901  B3  C3  D3  NaN  NaN  NaN   B4   C4   D4
The output should look like this:
      A   B   C   D    E    F    G
0  20170801  B0  C0  D0   B4   C4   D4
1  20170801  B1  C1  D1   B4   C4   D4
2  20170802  B2  C2  D2   B5   C5   D5
3  20170901  B3  C3  D3   B4   C4   D4
 
    