I would have the following question:
I have about 30 million rows of data. The data has six columns:
DISTINCT_IRECIPIENTID | ORDERNUMBER | ORDERDATE | ORDERDATE_OF_NEXT_ORDER | RETAINED_OR_NOT
"RETAINED_OR_NOT" has three categories:
- "Retained for one year" if the difference between ORDERDATE and ORDERDATE_OF_NEXT_ORDER is <=365 days, 
- Next_purchase_but_not_retained if the difference between ORDERDATE -ORDERDATE_OF_NEXT_ORDER is >365 days 
- "Only one lifetime purchase" if that person only placed one lifetime order 
I now want to calculate the # of years a consumer is retained in a row.
So if a customer placed four orders, and 3 consecutive ones are each paced at a time difference of <=1 year while the difference between the 3rd and the 4th is > 1 year then the value should be counted as 3 for the first three ones and 0 for the last one.
The data is already ordered by DISTINCT_IRECIPIENTID by ORDERDATE DESC
I basically wrote the following code, but the execution is extremely slow. Would anyone have some ideas on how to make the code more efficient?
def find_consecutive_purchases_in_a_year(input):
    count = 0
    exit_loop = 0
    sub_data = prepared_main_data_backup[prepared_main_data_backup['DISTINCT_IRECIPIENTID']
                                         == input]
    for index, row in sub_data.iterrows():
        if exit_loop == 1:
            return count
        if exit_loop == 0:
            if row['RETAINED_OR_NOT'] == 'retained_for_one_year':
                count += 1
            else:
                exit_loop = 1
    return count
data_test = prepared_main_data_backup
data_test['retain_counter'] = data_test['DISTINCT_IRECIPIENTID'].apply(
    find_consecutive_purchases_in_a_year)
Please find below some sample data.
    DISTINCT_IRECIPIENTID   TSORDERDATETIME FIRST_TRANS_DATE    ORDER_DATE_AFTER    DIFFERENCE_BETWEEN_ORDERS   RETAINED_OR_NOT Output
    1   2017-04-24-09.33.21.000000  2017-04-24-09.33.21.000000          only one lifetime purchase  0
    2   2017-04-24-09.35.16.000000  2017-04-24-09.35.16.000000          only one lifetime purchase  0
    3   2017-04-27-14.45.48.000000  2017-04-27-14.45.48.000000  2017-04-29-14.53.46.000000  2   retained_for_one_year   2
    3   2017-04-29-14.53.46.000000  2017-04-27-14.45.48.000000  2017-05-10-09.06.25.000000  11  retained_for_one_year   2
    3   2017-05-10-09.06.25.000000  2017-04-27-14.45.48.000000  2018-09-22-05.54.07.000000  500 next_purchase_but_not_retained  0
    3   2018-09-22-05.54.07.000000  2017-04-27-14.45.48.000000  2020-09-12-19.12.59.000000  721 next_purchase_but_not_retained  0
    3   2020-09-12-19.12.59.000000  2017-04-27-14.45.48.000000  2020-09-14-11.49.33.000000  2   retained_for_one_year   2
    3   2020-09-14-11.49.33.000000  2017-04-27-14.45.48.000000  2021-06-08-07.18.42.000000  267 retained_for_one_year   2
    4   2017-04-24-09.35.27.000000  2017-04-24-09.35.27.000000  2017-04-30-12.00.14.000000  6   retained_for_one_year   1
    4   2017-04-30-12.00.14.000000  2017-04-24-09.35.27.000000  2018-06-18-09.15.23.000000  414 next_purchase_but_not_retained  0
Any input is much appreciated!
