I am trying to do some calculations. This works fine:
import pandas
import numpy
print(pandas.__version__) #=> 0.25.2
columns = ['CUR', 'ISSUER_INDUSTRY', 'PROFIT']
data = [
    {'CUR': 'NOK', 'ISSUER_INDUSTRY': 'BANK', 'PROFIT': 10},
    {'CUR': 'NOK',  'ISSUER_INDUSTRY': 'BANK', 'PROFIT': 60},
    {'CUR': 'NOK', 'ISSUER_INDUSTRY': 'BANK', 'PROFIT': 80},
    {'CUR': 'SEK', 'ISSUER_INDUSTRY': 'BANK', 'PROFIT': 30},
    {'CUR': 'SEK', 'ISSUER_INDUSTRY': 'TECH', 'PROFIT': 30},
]
df = pandas.DataFrame(data=data, columns=columns)
df['above_50_quantile_profit'] = df.PROFIT >= df.groupby(['CUR','ISSUER_INDUSTRY']).PROFIT.transform('quantile')
df['above_90_quantile_profit'] = df.PROFIT >= df.groupby(['CUR','ISSUER_INDUSTRY']).PROFIT.transform(lambda x: x.quantile(0.9))
The DF now looks like
NOK     BANK    10  False   False
NOK     BANK    60  True    False
NOK     BANK    80  True    True
SEK     BANK    30  True    True
SEK     TECH    30  True    True
But in reality data is never this clean. So I have some rows that are missing industry data:
df.loc[df.CUR=='SEK','ISSUER_INDUSTRY'] = numpy.nan
Now my first calculation works fine:
df['above_50_quantile_profit'] = df.PROFIT >= df.groupby(['CUR','ISSUER_INDUSTRY']).PROFIT.transform('quantile')
but my second:
df['above_90_quantile_profit'] = df.PROFIT >= df.groupby(['CUR','ISSUER_INDUSTRY']).PROFIT.transform(lambda x: x.quantile(0.9))
now raised:
ValueError: Length mismatch: Expected axis has 3 elements, new values have 5 elements
I understand that grouping on NaN is a well know problem (see here), what I do not understand is why the first version .transform('quantile') have the same problem, as it is also a groupby call.
Could anyone help me understand the difference?
