I have a DataFrame like below, it's a transaction log:
c_id t_type unit  effective_date
1    enter  50     2/5/2020
2    exit   25     2/5/2020
1    exit   50     2/5/2020
2    enter  100    2/8/2020
3    enter  10     2/8/2020
1    enter  17     2/8/2020
3    exit   25     2/8/2020
3    exit   25     2/12/2020
I have a definition of active customer: a c_id who is entered units > exited_units.
For each day , I want to know how many active customers exists from beginning of the log.
Simplest way occur to mind is use for loop to consider each date as most recent day and get the count of active customers from beginning to that day, but I wanted more optimized scalable Pythonic/pandas way to solve this problem.
edit: My desired out put will be something like this:
effective_date  active_count
  2/5/2020           ?
  2/8/2020           ?
  2/12/2020          ?
  2/13/2020          ?
  2/14/2020          ?
  2/19/2020          ?
  2/20/2020          ?
replace question marks with active customer count of that day from beginning of the log. real issue here is For each day, we need to calculate the count from beginning of the log.
 
    