I have the following dataframe in python pandas which is over 4 million rows,
ID      BOOKING_TIME    ENTRY_TIME              
23239   1/1/2020 0:00   1/1/2020 0:40                 
51042   1/1/2020 0:11   1/1/2020 0:42                 
73373   1/1/2020 0:15   1/1/2020 0:56                   
14222   1/1/2020 0:22   1/1/2020 1:00                   
27116   1/1/2020 0:55   1/1/2020 1:15                    
....
The two columns have been merged from 2 different sql tables.
The same user could make a booking further down the table again.
Each row shows a user creating a booking (entering the queue to enter the venue) and the time the user enters the venue.
I am trying to create a column that shows the number of people that are currently queuing based on the time a user makes a new booking.
When a booking is made, an entry under BOOKING_TIME is recorded and they join a queue.
When they enter the venue, the ENTRY_TIME is recorded and they leave the queue.
I want to do a running count of the people that are in the queue. Meaning that in each row, if a booking time is hit, it does +1 in the IN_QUEUE column and once the entry time (for any of the cases in the previous timestamps) is hit, I want to -1 from the IN_QUEUE.
It will create a new column like this. (Ignore the working column, that is just to show the process)
ID      BOOKING_TIME    ENTRY_TIME      IN_QUEUE           (working)
23239   1/1/2020 0:00   1/1/2020 0:40   1                   +1
51042   1/1/2020 0:11   1/1/2020 0:42   2                   +1
73373   1/1/2020 0:15   1/1/2020 0:56   3                   +1
14222   1/1/2020 0:22   1/1/2020 1:00   4                   +1
27116   1/1/2020 0:55   1/1/2020 1:15   3                   -1   +1
....
Row 5 will -1 first because before 0:55am on 1/1/2020, the person from the first row entered the venue at 0:40am.
I think I have to use an apply function with a conditional function on every row but I am not sure how it would work.
I was thinking maybe for each row, count all rows where BOOKING_TIME < CURRENT BOOKING_TIME and ENTRY_TIME > CURRENT BOOKING_TIME, but I'm not sure how to code it or if my logic is even correct.
I saw this post Pandas: conditional rolling count but I cant figure out how to make it work with my case.
 
     
     
     
    