So I have a DataFrame which looks like this:
df = pd.DataFrame({'feature1':[34,45,52],'feature2':[1,0,1],'unparsed_features':["neoclassical, heavy, $2, old, bronze", "romanticism, gold, $5", "baroque, xs, $3, new"]})
df
       feature1  feature2                     unparsed_features
    0        34         1  neoclassical, heavy, $2, old, bronze
    1        45         0                 romanticism, gold, $5
    2        52         1                  baroque, xs, $3, new
I am trying to split the column unparsed_features into 6 columns (weight, age, colour, size, price and period) but as you can see the order is jumbled up and not only that, some fields are missing too.
I have a general idea of what each column can possibly be as shown below:
main_dict = {
 'weight': ['heavy','light'],
 'age': ['new','old'],
 'colour': ['gold','silver','bronze'],
 'size': ['xs','s','m','l','xl','xxl','xxxl'],
 'price': ['$'],
 'period': ['renaissance','baroque','rococo','neoclassical','romanticism']
}
Ideally I would like my Dataframe to look like the following:
df
   feature1  feature2                     unparsed_features weight price  age  \
0        34         1  neoclassical, heavy, $2, old, bronze  heavy    $2  old   
1        45         0                 romanticism, gold, $5           $5        
2        52         1                  baroque, xs, $3, new           $3  new   
  size  colour        period  
0       bronze  neoclassical  
1         gold   romanticism  
2   xs               baroque
I know the first step would be to split the string by comma but I am lost after that.
df['unparsed_features'].str.split(',')
Thank you for your help.
 
     
    