Here is how my dataset looks like:
Name | Country
---------------
Alex | USA
Tony | DEU
Alex | GBR
Alex | USA
I am trying to get something like this out, essentially grouping and counting:
Name | Country
---------------
Alex | {USA:2,GBR:1}
Tony | {DEU:1}
Works, but slow on LARGE datasets
Here is my code that does work on smaller dfs, but takes forever on bigger dfs (mine is around 14 million rows). I also use the multiprocessing module to speed up, but it doesn't help much:
def countNames(x):
    return dict(Counter(x))
def aggregate(df_full,nameList):
    df_list = []
    for q in nameList:
        df = df_full[df_full['Name']==q]
        df_list.append(df.groupby('Name')['Country'].apply(lambda x: str(countNames(x))).to_frame().reset_index()) 
    return pd.concat(df_list)
df = pd.DataFrame({'Name':['Alex','Tony','Alex','Alex'], 
                'Country':['USA','GBR','USA','DEU']})[['Name','Country']]
aggregate(df,df.Name.unique())
Is there anything that can speed up the internal logic (except for running with multiprocessing)?
 
     
     
    