I have two dataframes, test1 and test2. For each ID value in test2, I want to check the date in test2 and compare it to the date ranges for that same ID value in test1. If any of the date's in test2 are within a date range in test1, sum the amount column and assign that sum as an additional column in test1.
Output:
So the new test1 df will have a column amount_sum which is the sum of all amounts in test2 where the date is within the date range of test1 - for that ID
import random
import string
test1 = pd.DataFrame({
'ID':[''.join(random.choice(string.ascii_letters[0:4]) for _ in range(3)) for n in range(100)],
'date1':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(0, 100, 1))) for n in range(100)],
'date2':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(101, 200, 1))) for n in range(100)]
})
test2 = pd.DataFrame({
'ID':[''.join(random.choice(string.ascii_letters[0:4]) for _ in range(3)) for n in range(100)],
'amount':[random.choice([1,2,3,5,10]) for n in range(100)],
'date':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(0, 100, 1))) for n in range(100)]
})