I am in my first few weeks of learning pandas and need help with a problem I am stuck with. I have 2 dataframes as listed below:
df1 = pd.DataFrame({
    'City': ['Chicago','Atlanta', 'Dallas', 'Atlanta', 'Chicago', 'Boston', 'Dallas','El Paso','Atlanta'],
    'State': ['IL','GA','TX','GA','IL','MA','TX','TX','GA'],
    'Population': [8865000,523738,6301000,523738,8865000,4309000,6301000,951000,523738]
}, columns=['City', 'State', 'Population'])
df1
    City    State   Population
0   Chicago IL     8865000
1   Atlanta GA     523738
2   Dallas  TX     6301000
3   Atlanta GA     523738
4   Chicago IL     8865000
5   Boston  MA     4309000
6   Dallas  TX     6301000
7   El Paso TX     951000
8   Atlanta GA     523738
.
df2 = pd.DataFrame({
    'Airport': ['Hartsfield','Logan','O Hare','DFW'],
    'M_Code': [78,26,52,39]
},index=[
    'Atlanta',
    'Boston',
    'Chicago',
    'Dallas'])
df2
          Airport        M_Code
Atlanta   Hartsfield     78
Boston    Logan          26
Chicago   O Hare         52
Dallas    DFW            39
Expected output is:
df1
    City    State   Population  M_Code  City_indexed_in_df2
0   Chicago IL      8865000     52      True
1   Atlanta GA      523738      78      True
2   Dallas  TX      6301000     39      True
3   Atlanta GA      523738      78      True
4   Chicago IL      8865000     52      True
5   Boston  MA      4309000     26      True
6   Dallas  TX      6301000     39      True
7   El Paso TX      951000      NaN     False
8   Atlanta GA      523738      78      True
I started with:
df1.loc[df1.City.isin(df2.index),:]
    City    State   Population
0   Chicago IL  8865000
1   Atlanta GA  523738
2   Dallas  TX  6301000
3   Atlanta GA  523738
4   Chicago IL  8865000
5   Boston  MA  4309000
6   Dallas  TX  6301000
8   Atlanta GA  523738
As expected this filters out the row with El Paso. But am not able to come up with code to do this --> For every df1.City I need to lookup on df2.index and if found:
- Extract df2.M_Code and insert the value to a new column df1.M_Code
- Insert boolean result to a new column df1.City_indexed_in_df2
Can someone help me with how I can acheive this. In addition, my thought is that creating a unique array from df1.City and then doing a lookup on df2.index may get improved performance(being a novice, I haven't figured out how to do this beyond extracting the unique array below.)
arr = df1.City.unique()
array(['Chicago', 'Atlanta', 'Dallas', 'Boston', 'El Paso'], dtype=object)
Suggestions on changing the solution approach will be great too.
 
    