I want to merge two datasets that are indexed by time and id. The problem is, the time is slightly different in each dataset. In one dataset, the time (Monthly) is mid-month, so the 15th of every month. In the other dataset, it is the last business day. This should still be a one-to-one match, but the dates are not exactly the same.
My approach is to shift mid-month dates to business day end-of-month dates.
Data:
dt = pd.date_range('1/1/2011','12/31/2011', freq='D')
dt = dt[dt.day == 15]
lst = [1,2,3]
idx = pd.MultiIndex.from_product([dt,lst],names=['date','id'])
df = pd.DataFrame(np.random.randn(len(idx)), index=idx)
df.head()
output:
0
date id
2011-01-15 1 -0.598584
2 -0.484455
3 -2.044912
2011-02-15 1 -0.017512
2 0.852843
This is what I want (I removed the performance warning):
In[83]:df.index.levels[0] + BMonthEnd()
Out[83]:
DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',
'2011-05-31', '2011-06-30', '2011-07-29', '2011-08-31',
'2011-09-30', '2011-10-31', '2011-11-30', '2011-12-30'],
dtype='datetime64[ns]', freq='BM')
However, indexes are immutable, so this does not work:
In: df.index.levels[0] = df.index.levels[0] + BMonthEnd()
TypeError: 'FrozenList' does not support mutable operations.
The only solution I've got is to reset_index(), change the dates, then set_index() again:
df.reset_index(inplace=True)
df['date'] = df['date'] + BMonthEnd()
df.set_index(['date','id'], inplace=True)
This gives what I want, but is this the best way? Is there a set_level_values() function (I didn't see it in the API)?
Or maybe I'm taking the wrong approach to the merge. I could merge the dataset with keys df.index.get_level_values(0).year, df.index.get_level_values(0).month and id but this doesn't seem much better.