I have two DataFrames:
- df_components: list of unique components (ID, DESCRIPTION)
- dataset: several rows and columns from a CSV (one of these columns contains the description of a component).
I need to create a new column in the dataset with the ID of the component according to the df_components.
I tried to do this way:
Creating the df_components and the ID column based on the index
components = dataset["COMPDESC"].unique()
df_components = pd.DataFrame(components, columns=['DESCRIPTION'])
df_components.sort_values(by='DESCRIPTION', ascending=True, inplace=True)
df_components.reset_index(drop=True, inplace=True)
df_components.index += 1
df_components['ID'] = df_components.index
Sample output:
                                           DESCRIPTION   ID
1                                             AIR BAGS    1
2                                     AIR BAGS:FRONTAL    2
3               AIR BAGS:FRONTAL:SENSOR/CONTROL MODULE    3
4                                 AIR BAGS:SIDE/WINDOW    4
Create the COMP_ID in the dataset:
def create_component_id_column(row):
    found = df_components[df_components['DESCRIPTION'] == row['COMPDESC']]
    return found.ID if len(found.index) > 0 else None
dataset['COMP_ID'] = dataset.apply(lambda row: create_component_id_column(row), axis=1)
However this gives me the error ValueError: Wrong number of items passed 248, placement implies 1. Being 248 the number of items on df_components.
How can I create this new column with the ID from the item found on df_components?
 
    