I have two dataframes and I want to subtract counter1 with counter2. Preferably with a added column like 'diff'.
This is my attempt so far:
import pandas as pd
import numpy as np
file = ('data.csv')
df = pd.read_csv(file)
df = df[['Release', 'Created Date', 'Finished Date']]
x = df.groupby(['Release', 'Created Date'])['Created Date'].size().to_frame('size1')
y = df.groupby(['Release', 'Finished Date'])['Finished Date'].count().to_frame('size2')
x['counter1'] = x.groupby('Release').size1.cumsum().to_frame().sort_values('Created Date')
y['counter2'] = y.groupby('Release').size2.cumsum().to_frame().sort_values('Finished Date')
print(x)
print(y)
Output for X:
                                            size1  counter1
Release                       Created Date                 
Sony                          2020-07-09        1         1
                              2020-07-14        1         2
Sega                          2020-06-30        1         1
                              2020-07-09        1         2
                              2020-07-13        1         3
                              2020-07-14        1         4
                              2020-07-15        2         6
                              2020-07-17        2         8
                              2020-07-21        1         9
Nintendo                      2020-06-29        1         1
                              2020-07-01        2         3
                              2020-07-06        1         4
Output for y:
                                             size2  counter2
Release                       Finished Date                 
Sony                          2020-07-17         1         1
                              2020-07-20         1         2
Sony                          2020-07-03         1         1
                              2020-07-13         1         2
                              2020-07-17         1         3
                              2020-07-20         1         4
                              2020-07-23         3         7
                              2020-07-24         1         8
                              2020-07-28         1         9
Nintendo                      2020-07-09         1         1
                              2020-07-10         1         2
                              2020-07-15         1         3
This is my attempt, but the result is very confusing and certainly not correct:
t = x['counter1'] - y['counter2']
Had to remove output because of code compared to text was too much, but the output was very strange anyway.
EDIT.
print(df)
output:
Release        Created Date Finished Date
0   Sony       2020-07-21    2020-07-23
1   Sony       2020-07-17    2020-07-28
2   Sony       2020-07-17    2020-07-23
3   Sony       2020-07-15    2020-07-17
4   Sony       2020-07-15    2020-07-24
..                            ...          ...           ...
76  Sony       2020-06-02    2020-06-04
77  Sega       2020-06-01    2020-06-12
79  Sega       2020-06-01    2020-07-22
80  Sony       2020-06-01    2020-06-16
81  Nintendo   2020-06-01    2020-07-16
The goal is to make a dataset for a timeline chart, in which the dates are on the x-axis and when a Release is created it should go up on the y-axis, and then when it's finished it should go down on the y-axis.
Maybe I'm making it too complicated.
Update:
The help I got from a fellow community member made me accomplish the goal, much appreciated. Now I want to build upon that and create a multi timeline chart, with multiple releases shown in the same graph.
Here is the working solution for a single timeline chart.
// This is how I managed to get it working for a single release, but this will eventually become a problem later when I want all the releases.
df = df[df['Release'].str.contains("Sony")]
deposits = pd.Series(df.groupby('Created').size())
withdrawals = pd.Series(df.groupby('Finished').size())
balance = pd.DataFrame({'net_movements': deposits.sub(withdrawals, fill_value=0)})
balance = balance.assign(active=balance.net_movements.cumsum())
balance = balance.rename(columns={"active": "Sony"})
print(balance)
Output:
            net_movements  Sony
2020-06-01            3.0   3.0
2020-06-02            2.0   5.0
2020-06-03            2.0   7.0
2020-06-04           -1.0   6.0
2020-06-05            0.0   6.0
2020-06-08            1.0   7.0
We can remove the net_movements and make the final format complete:
balance = balance.drop(['net_movements'], axis=1)
print(balance)
             Sony
2020-06-01   3.0
2020-06-02   5.0
2020-06-03   7.0
2020-06-04   6.0
2020-06-05   6.0
2020-06-08   7.0
This solved my problem with displaying a single release. Now I want to build upon this and display all releases in the same graph.
Here is my attempt:
deposits = pd.Series(df.groupby(['Release', 'Created']).size())
print(deposits)
output: (shortened down)
Release                        Created   
Sega                           2020-06-01    1
                               2020-06-04    1
                               2020-07-14    1
Nintendo                       2020-06-01    3
                               2020-06-02    2
                               2020-06-03    2
withdrawals = pd.Series(df.groupby(['Release', 'Finished']).size())
print(withdrawals)
Release                        Finished  
Sony                           2020-06-12    1
                               2020-06-16    2
                               2020-06-18    1
Nintendo                       2020-06-04    1
                               2020-06-05    1
                               2020-06-16    2
Now, here is where it gets complicated. Not only does the columns appear everywhere, but the active column doesn't reset when it reaches a new release, it keep ticking over the releases.
balance = balance.assign(active=balance.net_movements.cumsum())
print(balance)
Output:
                                                         net_movements  active
Release                       Created    Finished                         
Sony                          2020-06-01 2020-06-12              1       1
                                         2020-06-16              2       3
                                         2020-06-18              0       3
Nintendo                      2020-06-04 2020-06-12             -1       2
                                         2020-06-16              1       3
                                         2020-06-18              0       3
Wanted format (with dummy values) :
             Sony     Nintendo
2020-06-01   3.0           4.0
2020-06-02   5.0           5.0
2020-06-03   7.0           2.0
2020-06-04   6.0           4.0
2020-06-05   6.0           4.0
2020-06-08   7.0           7.0
It's hard asking the correct question with as little information as possible, but at the same time not to little, this one got a bit long, but hopefully I explained my goal and problem reasonably well.
 
    