I have a slightly odd pandas group by question.
I have a source dataframe, which has three columns: Customer, Date and Item. I want to add a new column that contains Item History, being an array of all the Items for that Customer that are in earlier (defined by the Date) rows. For example given this source dataframe:
Customer    Date    Item
Bert    01/01/2019  Bread
Bert    15/01/2019  Cheese
Bert    20/01/2019  Apples
Bert    22/01/2019  Pears
Ernie   01/01/2019  Buzz Lightyear
Ernie   15/01/2019  Shellfish
Ernie   20/01/2019  A pet dog
Ernie   22/01/2019  Yoghurt
Steven  01/01/2019  A golden toilet
Steven  15/01/2019  Dominoes
I want to create this history feature:
Customer    Date    Item    Item History
Bert    01/01/2019  Bread   NaN
Bert    15/01/2019  Cheese  [Bread]
Bert    20/01/2019  Apples  [Bread, Cheese]
Bert    22/01/2019  Pears   [Bread, Cheese, Apples]
Ernie   01/01/2019  Buzz Lightyear  NaN
Ernie   15/01/2019  Shellfish   [Buzz Lightyear]
Ernie   20/01/2019  A pet dog   [Buzz Lightyear, Shellfish]
Ernie   22/01/2019  Yoghurt [Buzz Lightyear, Shellfish, A pet dog]
Steven  01/01/2019  A golden toilet NaN
Steven  15/01/2019  Dominoes    [A golden toilet]
I can do the following to get the History by date:
df.groupby(['Customer', 'Date']).agg(lambda x: tuple(x)).applymap(list).reset_index()
So that where a customer purchased multiple items on a single day, they're all listed in an array and where a customer purchased just one item that's alone in its own array, but I can't figure out how to concatenate them with the earlier rows.
 
    