I am trying to calculate how many times a particular company appeared on a news within one year of its earnings date and compare the count against others for the same time frame. I have two pandas dataframes, one with earnings dates and the other with news. My method is slow. Is there a better pandas/numpy way?
import pandas as pd
companies = pd.DataFrame({'CompanyName': ['A', 'B', 'C'], 'EarningsDate': ['2013/01/15', '2015/03/25', '2017/05/03']})
companies['EarningsDate'] = pd.to_datetime(companies.EarningsDate)
news = pd.DataFrame({'CompanyName': ['A', 'A', 'A', 'B', 'B', 'C'], 
                     'NewsDate': ['2012/02/01', '2013/01/10', '2015/05/13' , '2012/05/23', '2013/01/03', '2017/05/01']})
news['NewsDate'] = pd.to_datetime(news.NewsDate)
companies looks like 
    CompanyName EarningsDate
0   A           2013-01-15
1   B           2015-03-25
2   C           2017-05-03
news looks like 
CompanyName NewsDate
0   A       2012-02-01
1   A       2013-01-10
2   A       2015-05-13
3   B       2012-05-23
4   B       2013-01-03
5   C       2017-05-01
How can I rewrite this? This works but it is very slow as each dataframe is > 500k rows.
company_count = []
other_count = []
for _, company in companies.iterrows():
    end_date = company.EarningsDate
    start_date = end_date - pd.DateOffset(years=1)
    subset = news[(news.NewsDate > start_date) & (news.NewsDate < end_date)]
    mask = subset.CompanyName==company.CompanyName
    company_count.append(subset[mask].shape[0])
    other_count.append(subset[~mask].groupby('CompanyName').size().mean())
companies['12MonCompanyNewsCount'] = pd.Series(company_count)
companies['12MonOtherNewsCount'] = pd.Series(other_count).fillna(0)
Final result, companies looks like 
    CompanyName EarningsDate    12MonCompanyNewsCount   12MonOtherNewsCount
0   A           2013-01-15      2                       2
1   B           2015-03-25      0                       0
2   C           2017-05-03      1                       0
 
     
    