Say I have a dataframe as follows, representing accounts who are subscribed to a particular service, the date the subscription began, and the date the subscription ended (or null, if it hasn't ended yet).
ACCOUNT_ID   START_DATE   END_DATE
1            2018-05-01   2018-09-07
2            2018-07-20   2018-08-10
3            2018-06-10   null
My goal is to generate a table that belows, that has, for each month, how many people were subscribed to the service on the first day of that month.
DATE        SUBSCRIBER_COUNT
2018-01-01    0
2018-02-01    0
2018-03-01    0
2018-04-01    0
2018-05-01    1
2018-06-01    2
2018-07-01    3
2018-08-01    3
2018-09-01    2
2018-10-01    1
2018-11-01    1
Approaches I have tried:
- Somehow explode the dataframe such that for each account, I have multiple rows, one row for each day the account has been a subscriber. Then filter that to the days that I am interested in (i.e., first day of each month), then groupby day and count the accounts.
- Not sure how to generate such an exploded dataset without using nasty UDFs.
 
 - Generate a list of the days that I am interested in; for each one, add a new boolean column to the original dataframe named 
subscribed_yyyymmddby doingstart < day && day < end. Then boolcount these columns to get the subscriber counts.- It's clunky to have to add new columns, and also doesn't scale well if instead I wanted to do this daily not monthly