I have a pandas data frame, df, that has second-to-second data (Longitude, Latitude, etc.) for each driver. The data frame consists of several trips. There is a feature called Event_Type that can be used to determine the start and end of trips:
ignitionOnList = df[df['Event_Type'] == 'Ignition On'].index.tolist()
ignitionOffList = df[df['Event_Type'] == 'Ignition Off'].index.tolist()
So, imagine I have 5 trips in this data frame. the length of ignitionOnList and ignitionOffList would be 5. I'd like to do analysis on each trip specifically and store them in a pandas data frame. Here's what I do:
dfTrips = pd.DataFrame({'Date' : [],'Vehicle' : [], 'Trip_Number' : [], 'Start_Time' : [], 'Duration' : [],
'Collision': [],'Harsh_Steering' : [], 'Harsh_Deceleration' : [], 'Harsh_Acceleration' : [],
'Harsh_Preferred_Speed' : []})
tripCount = -1
tripNumbers = len(ignitionOnList)
for tripNumber in range(tripNumbers):
tripCount += 1
dfTemp = df.loc[ignitionOnList[tripNumber]:ignitionOffList[tripNumber]+1]
# Doing stuff to this temporary data frame and storing them, for example:
dfTrips.loc[tripCount,'Start_Time'] = dfTemp.loc[0,'Time'].strftime("%H:%M:%S")
dfTrips.loc[tripCount,'Finish_Time'] = dfTemp.loc[dfTemp.shape[0]-1,'Time'].strftime("%H:%M:%S")
# Using a function I have defined named `get_steering_risk` to get risky behaviour for each trip
dfTrips.loc[tripCount,'Harsh_Deceleration'] = get_deceleration_risk(dfTemp)
dfTrips.loc[tripCount,'Harsh_Steering'] = get_steering_risk(dfTemp)
This works. But I am guessing there are better ways to do this in Python without for loops. I am not sure I can simply use apply because I am not applying the same function to the whole data frame.
An alternative might be to redefine the functions so that they produce a column in df
and apply them to the whole data frame, and then aggregating the results for each trip. For example, get_steering_risk function can be defined to make 0 or 1 for each second in df and then the percentage of 1s for each trip would be Harsh_Steering in dfTrips. However, some functions cannot be applied on the whole data frame. For example, one function regresses the velocity versus acceleration and it should be done trip by trip. What is the best way to approach this? Thanks.