I try to solve the following problem:
I have these two dataframes, df1:
| label | term | 
|---|---|
| A01.12 | "Culture de légumes, de melons, de racines et de tubercules" | 
| A01.14 | Culture de la canne à sucre | 
and df2:
| company | organization_activity | 
|---|---|
| coca-cola | "Culture de légumes, de melons, de racines et de tubercules" | 
| rum-factory | Culture de la canne à sucre | 
I now would like to map the "term" of df1 to the "organization_activity" of df2 and create a new column with the name "codes" in df2 with the respective label in df1.
The resulting dataframe should look like this:
| organization_activity | codes | 
|---|---|
| "Culture de légumes, de melons, de racines et de tubercules" | A01.12 | 
| Culture de la canne à sucre | A01.14 | 
This partially works by using:
df2['codes'] = df2['organization_activity'].map(df1.set_index('term')['label'].to_dict())
Oddly the new column is created and filled but for a lot of terms the mapping does not work and a Nan value is inserted even though the "organization_activity" I am mapping exists in the "term" column. Does anyone have an idea on why this could happen? The formatting for the values in "term" and "organization_activity" is equal and there doesn't seem to be a difference in the format of values that are mappable and ones that are not. Is there maybe a different way to do this?
The dataframe I showed is just an abstract so there are multiple companies that can have the same "organization_activity".
 
     
    