I would like to group a DataFrame by partial substrings. This is a sample .csv file:
GridCode,Key
1000,Colour
1000,Colours
1001,Behaviours
1001,Behaviour
1002,Favourite
1003,COLORS
1004,Honours
What I did so far is importing the file as df = pd.read_csv(sample.csv), and then I put all the strings to lowercases with df['Key'] = df['Key'].str.lower(). The first thing I tried is groupby by GridCode and Key with:
g = df.groupby([df['GridCode'],df['Key']]).size()
then unstack and fill:
d = g.unstack().fillna(0)
and the resulting DataFrame is:
Key       behaviour  behaviours  colors  colour  colours  favourite  honours
GridCode                                                                    
1000              0           0       0       1        1          0        0
1001              1           1       0       0        0          0        0
1002              0           0       0       0        0          1        0
1003              0           0       1       0        0          0        0
1004              0           0       0       0        0          0        1
Now what I would like to do is to group only strings containing the substring 'our', in this case avoiding only the colors Key, creating a new column with the desired substring. The expected result would be like:
Key       'our'
GridCode                                                                    
1000        2              
1001        2
1002        1
1003        0
1004        1
I tried also to mask the DataFrame with masked = df['Key'].str.contains('our'), then df1 = df[mask], but I can't figured out how to make a new column with the new groupby counts. Any help would be really appreciated.