There are two CSV files having same following schema, generated a month apart FILE20221105.csv and FILE20221205.csv.
Key:Int
Val1:String
Val2:String
Val3:String
Val4:String
Some of the Key will have Val column with string “N/A”. I want to find percentage of “N/A” in each column. I can do that the following way after reading CSV file in the dataframe say df
((df.drop([‘Key’], axis=1))==”N/A”).mean()
I will get say something like this (Example)
Val1:0.33
Val2:0.25
Val3:0.1
Val4:0.2
But real ask has one more condition that I am trying to do without a loop The requirement is to find percentage of “N/A” in each Val column in current month but not having “N/A” last month. Both CSV are joined based on column Key (Inner join). Column Key has unique value.
**Example**
FILE20221105.csv (All row with N/A in this month will not be considered)
Key    Val1    Val2    Val3    Val4
101    A1      B1      C1      N/A
102    N/A     B2      C2      D2
103    A3      N/A     C3      D3
104    A4      B4      N/A     D4
105    A5      B1      C1      N/A
106    A6      B2      C2      D2
107    A7      N/A     C3      D3
108    A8      B4      C4      D4
FILE20221205.csv (All row which previously is Non N/A but N/A this month will be considered)
Key    Val1    Val2    Val3    Val4
101    A1      B1      C1      D1
102    A2      B2      C2      D2
103    A3      B3      C3      D3
104    A4      N/A     C4      D4
105    A5      N/A     C1      D1
106    A6      N/A     C2      N/A
107    N/A     B3      C3      N/A
108    N/A     B4      N/A     D4
Output
Val1:5/7 (7 in denominator as 1 value in previous month row was N/A, in current month 2 out of remaining 7 are N/A therefore 5 in numerator)
Val2:3/6 (6 in denominator as 2 value in previous month row was N/A, in current month 3 out of remaining 6 are N/A therefore 3 in numerator)
Val2:6/7 (7 in denominator as 1 value in previous month row was N/A, in current month 1 out of remaining 7 are N/A therefore 5 in numerator)
Val2:4/6 (6 in denominator as 2 value in previous month row was N/A, in current month 2 out of remaining 6 are N/A therefore 4 in numerator)
