I have two tables, x and y. x has customer_id and reporting_date and y contains customer_id and trigger_date.
For each row in x I want to join the row in y whose trigger_date is the nearest to the reporting_date in x. I have hard-coded my desired result as z in the example below, which should run in the terminal out-of-the-box.
import pandas as pd
import numpy as np
x = pd.DataFrame({
'customer_id': [1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4],
'reporting_date': [
'2021-09-01',
'2021-10-01',
'2021-11-01',
'2019-01-01',
'2019-02-01',
'2021-12-01',
'2022-01-01',
'2022-02-01',
'2022-03-01',
'2022-04-01',
'2022-07-01',
'2022-08-01',
'2022-09-01'
],
'payment_status': [0, 0, 2, 0, 2, 0, 0, 0, 0, 2, 0, 0, 2]
})
y = pd.DataFrame({
'customer_id': [1, 1, 3, 3, 3, 4],
'trigger_date': [
'2019-08-21',
'2021-09-17',
'2020-01-01',
'2022-01-03',
'2022-03-08',
'2022-10-01']})
# DESIRED RESULT IS z:
z = x.__deepcopy__()
z['nearest_trigger_date'] = [
'2019-09-17',
'2019-09-17',
'2019-09-17',
np.nan,
np.nan,
'2022-01-03',
'2022-01-03',
'2022-01-03',
'2022-03-08',
'2022-03-08',
'2022-10-01',
'2022-10-01',
'2022-10-01' ]
x['reporting_date'] = x['reporting_date'].apply(pd.to_datetime)
y['trigger_date'] = y['trigger_date'].apply(pd.to_datetime)
z['nearest_trigger_date'] = z['nearest_trigger_date'].apply(pd.to_datetime)
EDIT This is not a duplicate of Merging series of pandas dataframe into single dataframe. That is a concatenation, not a "merge" or "join". And pd.merge_asof only supports merging on one column.