When I group 'time_interval_code' the values of 'vehicle_real' in a result file are correct only for the first group, but not for the others. When 'time_interval_code' was used in previous group it seems that it is not in the sum of the new group. How to make sure 'vehicle _real' values are available to sum in every group?
The idea of 'time_interval_code' was to get rid of time format. I have 8 time intervals in the morning (07:00 - 07:15 is 1, 07:15 - 07:30 - 2, etc. up to 8).
I want to check maximum flow rate in an hour by adding 15 minutes each time for every junction and every direction from which cars were entering a junction. The measurements are given in 15 minutes interval, so I need to check 4 intervals every time. Results to be 'junction_id', 'source_direction' and sum of the 'vehicle_real' for that junction, direction and group of 'time_interval_code'.
To solve this I created groups that contains 4 time intervals. The problem I have is when I group 'time_interval_code' the values of 'vehicle_real' in a result file are correct only for the first group (1,2,3,4), but not for the others.
import pandas as pd
data = pd.read_excel("traffic.xlsx")
# Create a DataFrame from the list of data
df = pd.DataFrame(data)
# Define a function to get the morning groups for each time interval code
def get_morning_group(time_interval_code):
  morning_groups = [(1, 2, 3, 4), (2, 3, 4, 5), (3, 4, 5, 6), (4, 5, 6, 7), (5, 6, 7, 8)]
  for group in morning_groups:
    if time_interval_code in group:
      return group
# Add a new column to the DataFrame that contains the morning groups for each time interval code
df['morning_groups'] = df['time_interval_code'].apply(get_morning_group)
# Group data by values
grouped_data = df.groupby(['junction_id', 'source_direction', 'morning_groups'])
# Calculate the sum of the vehicles_real values for each group
grouped_data = grouped_data['vehicles_real'].sum()
# Convert the grouped data back into a DataFrame
df = grouped_data.reset_index()
# Create the pivot table
pivot_table = df.pivot_table(index=['junction_id', 'source_direction'], columns=['morning_groups'], values='vehicles_real')
# Save the pivot table to a new Excel file
pivot_table.to_excel('max_flow_rate.xlsx')
The traffic.xlsx file has ca. 140k records. Every junction has at least 2 'source_direction' values. Junction with 'source_direction' has 'vehicles_real' values for every 'time_interval_code'. The file looks like this:
| id | time_interval_code | junction_id | source_direction | vehicles_real | 
|---|---|---|---|---|
| 1 | 3 | 1001 | N | 140 | 
| 2 | 1 | 2002 | E | 10 | 
| 18 | 2 | 2011 | W | 41 | 
| 21 | 5 | 2030 | S | 2 | 
| 33 | 8 | 2030 | N | 140 | 
| 35 | 7 | 2150 | E | 10 | 
| 41 | 6 | 2150 | W | 41 | 
| 52 | 5 | 2150 | S | 2 | 
The output I get is fine, but the values are correct only for (1,2,3,4).
| junction_id | source_direction | (1,2,3,4) | (2,3,4,5) | (3,4,5,6) | (4,5,6,7) | (5,6,7,8) | 
|---|---|---|---|---|---|---|
| 1001 | N | 257 | 95 | 69 | 61 | 59 | 
| 1001 | S | 456 | 120 | 136 | 153 | 111 | 
| 1002 | N | 2597 | 676 | 670 | 619 | 645 | 
| 1002 | S | 2571 | 552 | 641 | 656 | 595 | 
| 1003 | N | 586 | 181 | 148 | 127 | 142 | 
| 1003 | S | 711 | 174 | 147 | 157 | 141 |