I'm new to Pandas but thanks to Add column with constant value to pandas dataframe I was able to add different columns at once with
c = {'new1': 'w', 'new2': 'y', 'new3': 'z'}
df.assign(**c)
However I'm trying to figure out what's the path to take when I want to add a new column to a dataframe (currently 1.2 million rows * 23 columns).
Let's simplify the df a bit and try to make it more clear:
Order   Orderline   Product  
1       0           Laptop  
1       1           Bag  
1       2           Mouse  
2       0           Keyboard  
3       0           Laptop  
3       1           Mouse  
I would like to add a new column where depending if the Order has at least 1 product == Bag then it should be 1 (for all rows for that specific order), otherwise 0.
Result would become:
Order   Orderline   Product   HasBag  
1       0           Laptop    1  
1       1           Bag       1  
1       2           Mouse     1  
2       0           Keyboard  0  
3       0           Laptop    0  
3       1           Mouse     0  
What I could do is find all the unique order numbers, then filter out the subframe, check the Product column for Bag, if found then add 1 to a new column, otherwise 0, and then replace the original subframe with the result.
Likely there's a way better manner to accomplish this, and also way more performant.
The main reason I'm trying to do this, is to flatten things down later on. Every order should become 1 line with some values of product. I don't need the information for Bag anymore but I want to keep in my dataframe if the original order used to have a Bag (1) or no Bag (0).
Ultimately when the data is cleaned out it can be used as a base for scikit-learn (or that's what I hope).
 
    