I am creating a count function on subsets of Pandas DataFrame and intends to export a dictionary/spreadsheet data that consists only of the groupby criteria and the counting results.
In [1]: df = pd.DataFrame([[Buy, A, 123, NEW, 500, 20190101-09:00:00am], [Buy, A, 124, CXL, 500, 20190101-09:00:01am], [Buy, A, 125, NEW, 500, 20190101-09:00:03am], [Buy, A, 126, REPLACE, 300, 20190101-09:00:10am], [Buy, B, 210, NEW, 1000, 20190101-09:10:00am], [Sell, B, 345, NEW, 200, 20190101-09:00:00am], [Sell, C, 412, NEW, 100, 20190101-09:00:00am], [Sell, C, 413, NEW, 200, 20190101-09:01:00am], [Sell, C, 414, CXL, 50, 20190101-09:02:00am]], columns=['side', 'sender', 'id', 'type', ''quantity', 'receive_time'])
Out[1]: 
   side  sender  id    type     quantity  receive_time 
0  Buy   A       123   NEW      500       20190101-09:00:00am
1  Buy   A       124   CXL      500       20190101-09:00:01am
2  Buy   A       125   NEW      500       20190101-09:00:03am
3  Buy   A       126   REPLACE  300       20190101-09:00:10am
4  Buy   B       210   NEW      1000      20190101-09:10:00am
5  Buy   B       345   NEW      200       20190101-09:00:00am
6  Sell  C       412   NEW      100       20190101-09:00:00am
7  Sell  C       413   NEW      200       20190101-09:01:00am
8  Sell  C       414   CXL      50        20190101-09:02:00am
The count function is as below (mydf is passed in as a subset of the dataframe):
def ordercount(mydf):
   num = 0.0
   if mydf.type == 'NEW':
      num = num + mydf.qty
   elif mydf.type == 'REPLACE':
      num = mydf.qty
   elif mydf.type == 'CXL':
      num = num - mydf.qty
   else: 
      pass
   orderdict = dict.fromkeys([mydf.side, mydf.sender, mydf.id], num)
   return orderdict
After reading the data from csv, I group it by some criteria and also sort by time:
df = pd.read_csv('xxxxxxxxx.csv, sep='|', header=0, engine='python', names=col_names)
sorted_df = df.groupby(['side', 'sender', 'id']).apply(lambda_df:_df.sort_values(by=['time']))
Then call the previously defined function on the sorted data:
print(sorted_df.agg(ordercount))
But the value error kept bumping up saying too many lines to call.
The function way of counting data may not be efficient but it is the most straightforward way that I can think of to match order types and count quantity accordingly. I expect the program to output a table where only side, sender, id and counted quantity are shown. Is there any way to achieve this? Thanks.
Expected output:
   side   sender   total_order_num   trade_date 
0  Buy    A        300               20190101
1  Buy    B        1200              20190101
2  Sell   C        250               20190101
 
    