0

I have two datetime columns in different dataframes, one of general dates and one of holiday dates. I want to add a new column to the first dataframe stating whether or not the date in that row is a holiday.

df = pd.DataFrame({'date': [pd.Timestamp("2018-01-01"), pd.Timestamp("2018-01-02")]})
holidays = pd.DataFrame({'date': [pd.Timestamp("2018-01-01"), pd.Timestamp("2018-12-25")]})

This is my current method:

holiday_dates = holidays.date
df['holiday'] = df.date.map(lambda x: sum(holiday_dates.isin([x])) > 0)

It works, giving the desired output below but is incredibly slow for my data set of around 100,000 entries.

In[]: df
Out[]:
        date  holiday
0 2018-01-01     True
1 2018-01-02    False

Is there a more efficient way of doing this operation?

KyleL
  • 855
  • 6
  • 24
  • I'd say this is not a duplicate as it is about finding a more efficient, not just asking how to do something. Although the answers to both are the same... – KyleL Nov 16 '18 at 15:23

1 Answers1

2

I want to add a new column to the first dataframe stating whether or not the date in that row is a holiday.

You should be able to use .isin() directly for this:

In [7]: df['holiday'] = df['date'].isin(holidays['date'])                       

In [8]: df                                                                      
Out[8]: 
        date  holiday
0 2018-01-01     True
1 2018-01-02    False
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
  • 1
    I benchmarked this as taking `0.04s` compared to the original `647s`. A definite improvement! – KyleL Nov 16 '18 at 15:20