An extension to my previous question. I have a source dataframe, which has three columns: Customer, Date and Item. I want to add a new column that contains Item History, being an array of all the Items for that Customer that are in earlier (defined by the Date) rows. Where a customer has made multiple purchases on the same date, neither row's item should be listed in the item history for the other.
So, given this sample data:
df = pd.DataFrame({'Customer':['Bert', 'Bert', 'Bert', 'Bert', 'Bert', 'Ernie', 'Ernie', 'Ernie', 'Ernie', 'Steven', 'Steven'], 'Date':['01/01/2019', '15/01/2019', '20/01/2019', '20/01/2019', '22/01/2019', '01/01/2019', '15/01/2019', '20/01/2019', '22/01/2019', '01/01/2019' ,'15/01/2019'], 'Item':['Bread', 'Cheese', 'Apples', 'Pears', 'Toothbrush', 'Toys', 'Shellfish', 'Dog', 'Yoghurt', 'Toilet', 'Dominos']})
Customer    Date    Item
Bert    01/01/2019  Bread
Bert    15/01/2019  Cheese
Bert    20/01/2019  Apples
Bert    20/01/2019  Pears
Bert    22/01/2019  Toothbrush
Ernie   01/01/2019  Toys
Ernie   15/01/2019  Shellfish
Ernie   20/01/2019  Dog
Ernie   22/01/2019  Yoghurt
Steven  01/01/2019  Toilet
Steven  15/01/2019  Dominos
The output I'd like to see would be:
Customer    Date    Item        Item History
Bert    01/01/2019  Bread       NaN
Bert    15/01/2019  Cheese      [Bread]
Bert    20/01/2019  Apples      [Bread, Cheese]
Bert    20/01/2019  Pears       [Bread, Cheese]
Bert    22/01/2019  Toothbrush  [Bread, Cheese, Apples, Pears]
Ernie   01/01/2019  Toys        NaN
Ernie   15/01/2019  Shellfish   [Toys]
Ernie   20/01/2019  Dog         [Toys, Shellfish]
Ernie   22/01/2019  Yoghurt     [Toys, Shellfish, Dog]
Steven  01/01/2019  Toilet      NaN
Steven  15/01/2019  Dominos     [Toilet]
Note that for Bert's purchases on 20/01/2019, neither's History column contains the other's item. For his 22/01/2019 purchase, both of the items from 20/01/2019 are included.
The answer to the previous question is a nifty bit of list comprehension, in the form:
df['Item History'] = [x.Item[:i].tolist() for j, x in df.groupby('Customer') 
                                          for i in range(len(x))]
df.loc[~df['Item History'].astype(bool), 'Item History']= np.nan
But obviously "i" in the x.Item[:i] needs to work out the last row where the Date was not the same as the current row. Any advice on achieving that is much appreciated.
 
     
     
    