I have two dataframes. One contains data on user subscriptions, another on user session.
Example of subscription data (df_subscriptions):
        user_id                                 created                     ended
10238   140baa7a-1641-41b5-a85b-c43dc9e12699    2021-08-13 19:37:11.373039  2021-09-12 19:37:11.373039
10237   fbfa999c-9c56-4f06-8cf9-3c5deb32d5d2    2021-08-13 15:25:07.149982  2021-09-12 15:25:07.149982
6256    a55e64b0-a783-455e-bd9d-edbb4815786b    2021-08-13 18:31:36.083681  2021-09-12 18:31:36.083681
6257    ca2c0ee1-9810-4ce7-a2ec-c036d0b8a380    2021-08-13 16:29:52.981836  2021-09-12 16:29:52.981836
7211    24378efd-e821-4a51-a3e6-39c30243a078    2021-08-13 19:58:19.434908  2021-09-12 19:58:19.434908
Example of session data:
            user_id                                 session_start           session_duration
11960653    6f51df1a-8c2b-4ddb-9299-b36f250b05dc    2020-01-05 11:39:29.367 165.880005
80076       697e1c0a-c026-4104-b13f-1fd74eec5890    2021-01-31 02:16:33.935 22.883301
1577621     02b23671-8ce3-452b-b551-03b5ea7dce47    2021-05-18 02:07:32.589 4.283300
1286532     a506fb53-3505-44db-880a-27ad483151f0    2020-07-29 16:47:51.908 51.000000
18875432    1ea77db5-fe4a-414f-ba47-1f448175df3f    2020-10-17 04:00:35.269 360.733307
I need to calculate the total time user spend on the service while his subscription is active. The code below gives me correct/expected result, but takes A LOT of time on the real data:
def sessions_during_sub (user_id, start_date, end_date):
    result = df_sessions.loc[(df_sessions.user_id == user_id)&
                             (df_sessions.session_start >= start_date)&
                             (df_sessions.session_start <= end_date)].session_duration.sum()
    return result
df_subscriptions['sessions'] = df_subscriptions.apply(lambda x: sessions_during_sub(x['user_id'], x['created'], x['ended']), axis=1)
Is there any way to to do it proper pandas way/vectorized? Any ideas of how to speed it up really.