I have a DataFrame having these kind of data :
df = pd.DataFrame({
    'id' : ['a', 'a', 'b', 'b', 'c', 'c'],
    'alias' : ['value'+str(i) for i in range(6)],
    'source' : ['src1', 'src2', 'src1', 'src2', 'src1', 'src3']
})
print(df)
output :
  id   alias source
0  a  value0   src1
1  a  value1   src2
2  b  value2   src1
3  b  value3   src2
4  c  value4   src1
5  c  value5   src3
And i want to change the structure of the dataframe to get somthing like this :
     src1    src2    src3
a  value0  value1    None
b  value2  value3    None
c  value4    None  value5
So to process this task i made this function :
def process_aliases(df):
    sources = set(df['source'])
    indexes = list(set(df['id']))
    cols = {source:[] for source in sources}
    for index in indexes:
        subdf = df[df['id']==index]
        for source in sources:
            alias = subdf[df['source'] == source]['alias']
            cols[source].append(alias.iloc[0].strip() if len(alias) == 1 else None)
    return pd.DataFrame(cols, index=indexes)
My problem is that my dataframe contains more than 1 000 000 lines. So this function take too much time to be processed. More than 1 hour of execution time using Google Colab.
I don't know if my function is the best way to do this. So, i'm asking for help.
What is the best way to accomplish this task ? Or, what is the fastest way to search for elements over a DataFrame ?
 
    