I'm trying to calculate the time between a username's status changes from Original Install/ Reinstall to Uninstall (and vice versa) on an app. I've been programming in Python for around a year or so, so my functions are sometimes slower or not as optimal; I'm having a similar issue with this task.
I used the following loop to calculate the time difference between states. The major issue is that this function is horrendously slow (for the 700k rows of data I'm working with), and it doesn't return the all of the time intervals.
# Import modules
import pandas as pd
import numpy as np
# Load sample data
data = pd.read_csv('username_status_dates.csv').drop(columns='index')
data['Date'] = pd.to_datetime(data['Date'], format='%Y-%M-%d')
# Create dataframe to record time intervals
customerStatuses_df = pd.DataFrame(columns=['username', 'status1', 'status2', 'timeToUninstall', 'timeToReinstall'])
# Iterate over data, calculate time intervals (by username), and add to customerStatuses
for user in data:
    # Get user sub data
    user_df = data.loc[ customerStatuses_df['Username'] == user ]
    dates = list(user_df['Date'])
    status = list(user_df['Status'])
    for val in np.arange(0, len(status) - 1):
        # If status in installed (original or re)
        if (status[val] == 'Original Install') or (status[val] == 'Reinstall'):
            # If next status is uninstall (un)
            if status[val + 1] == 'Uninstall':
                # Calculate time interval between statuses
                d_unin = dates[val + 1] - dates[val]
                # Add to customerStatuses
                customerStatuses_df = customerStatuses_df.append({'username': user, 'status1': status[val], 'status2': status[val+1], 'timeToUninstall': d_unin}, ignore_index=True)
        # If status in not installed (un)      
        elif (status[val] == 'Uninstall'):
            # If next status is installed (re)
            if status[val + 1] == 'Reinstall':
                # Calculate time interval between statuses
                d_rein = dates[val + 1] - dates[val]
                # Add to customerStatuses
                 customerStatuses_df = customerStatuses_df.append({'username': user, 'status1': status[val], 'status2': status[val+1], 'timeToReinstall': d_rein}, ignore_index=True)
sample data: username_status_dates.csv
   Username         Status                  Date
0   Joey74      Original Install    2019-01-20 00:09:00
1   Sam22       Original Install    2019-01-15 00:10:00
2   Chloe35     Uninstall           2019-01-30 00:10:00
3   Sam22       Uninstall           2019-01-01 00:11:00
4   Zoe44       Reinstall           2019-01-10 00:12:00
5   Joey74      Reinstall           2019-01-20 00:12:00
6   Shelby99    Original Install    2020-01-07 00:01:00
7   Joey74      Uninstall           2020-01-05 00:01:00
8   Joey74      Reinstall           2020-01-30 00:01:00
9   Susane11    Reinstall           2020-01-04 00:02:00
result from function: customerStatuses_df
username    status1 status2 timeToUninstall timeToReinstall
0   Joey74  Reinstall   Uninstall   349 days 23:49:00   NaN
1   Joey74  Uninstall   Reinstall   NaT 25 days 00:00:00
2   Sam22   Original Install    Uninstall   -14 days +00:01:00  NaN
What am I missing, here?
 
     
    