I am looking to merge 2 dataframes in python, df1 and df2, on 2 columns, Site and Building, with different row quantities as a way of attaining a "safe" value for each generator value in df1. Below is a demonstration code, though I've created the dataframes in the below example (which appears to work), the data for each table in the actual issue comes from SQL queries, which leads me to believe the merge is having issues due to data type.
import pandas as pd
df = {'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Greece','Greece','Greece','Greece','Greece','Greece'],
        'Building' : ['X1','X1','X1','X1','X1','X1','X2','X2','X2','X2','X2','X2','X3','X3','X3','X3','X3','X3','X4','X4','X4','X4','X4',   'X4','X5','X5','X5','X5','X5','X5','X1','X1',   'X1','X1',  'X1','X1','X2','X2','X2','X2','X2','X2','X3','X3','X3','X3','X3','X3','X1', 'X1','X1',  'X1','X1',  'X1'],
        'Generator' : ['DE','NDE',  'GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE',  'GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4',  'DE','NDE','GBX1','GBX2','GBX3','GBX4','DE',    'NDE','GBX1','GBX2','GBX3','GBX4',  'DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4']}
df1 = pd.DataFrame(df1, columns = ['Site', 'Building', 'Generator'])
df15 = {'Building' : ['X1','X2','X3','X4','X5','X1','X2','X3','X1'],
        'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Holland','Holland','Holland','Greece'],
        'Safe' : [1,    1,  1,  1,  1,  0,  1,  1,  0]}
df2 = pd.DataFrame(df15, columns = ['Site', 'Building', 'Safe'])
df3 = df1.merge(df2, how = 'left', on = ['Site', 'Building'], indicator = True)
I've also tried changing the data types of each to string as per pandas - Merging on string columns not working (bug?),
df1['Site'] = df1['Site'].astype('str')
df1['Building']=df1['Building'].astype('str')
df1['Site'] = df1['Site'].astype('str')
df1['Building']=df1['Building'].astype('str')
and also the step that mentions checking encoding as per below, but all seem to match ie. there are no visible dependencies in content;
df1['Building'] = df1['Building'].str.encode('UTF-8')
df1['Site'] = df1['Site'].str.encode('UTF-8')
Datatypes:
df2.datatypes:
    Site                           object
    Building                       object
    Safe                           object
    dtype: object
    df1.datatypes:
    Building      object
    Site          object
    Generator     object
    dtype:        object
I've tried the below code:
df3 = df1.merge(df2, left_on = ['Site', 'Building'], right_on = ['Site', 'Building'], how = 'left', indicator = 'indicator')
or:
df3 = df1.merge(df2, on = ['Site', 'Building'], how = 'left', indicator = 'indicator')
but the outcome ends up with only the data from left ie outcome 1.
I've tried outer join as per below, which yields outcome 2:
df3 = df1.merge(df2, on = ['Site', 'Building'], how = 'outer', indicator = 'indicator')
Apologies for my relative ignorance with respect to pandas.