I am trying to append a column called time to my pre-existing dataFrame df1. The values for my new column time are derived using the unique identifier column snapshot which exists in both df1 and df2 to link them.
I've tried to leverage both apply and lambda to help me create this column but it doesn't work.
df1 = pd.DataFrame({'product' : ['PENS', 'PENS', 'PENS', 'PENS', 'PENS','STAPLER','STAPLER'],
'price' : [20,25,35,40,70,100,140],
'snapshot' : [1,2,3,4,5,1,2]})
???df1
price product snapshot
0 20 PENS 1
1 25 PENS 2
2 35 PENS 3
3 40 PENS 4
4 70 PENS 5
5 100 STAPLER 1
6 140 STAPLER 2
df2 = pd.DataFrame({'snapshot' : [1,2,3,4,5],
'publish_time' : ['10/10/2005', '2/19/2007', '6/20/2007', '7/10/2010', '7/15/2010']})
>>>df2
publish_time snapshot
0 10/10/2005 1
1 2/19/2007 2
2 6/20/2007 3
3 7/10/2010 4
4 7/15/2010 5
df1['time'] = df1['snapshot'].apply(lambda x : df2['publish_time'].loc[df2['snapshot'] == x])
ValueError: Wrong number of items passed 5, placement implies 1
Ideally what I want is this something like this:
>>>df1
price product time snapshot
0 20 PENS 10/10/2005 1
1 25 PENS 2/19/2007 2
2 35 PENS 6/20/2007 3
3 40 PENS 7/10/2010 4
4 70 PENS 7/15/2010 5
5 100 STAPLER 10/10/2005 1
6 140 STAPLER 2/19/2007 2