My work environment mainly use PySpark, but doing some Googling, it is very complex to transpose in PySpark. I would like to keep it in PySpark but if it's much easier to do it in Pandas, I will convert the Spark dataframe to a Pandas dataframe. The dataset isn't so big where performance is an issue I would think.
I would like to transform a dataframe with multiple columns into rows:
Input:
import pandas as pd
df = pd.DataFrame({'Record': {0: 1, 1: 2, 2: 3},
 'Hospital': {0: 'Red Cross', 1: 'Alberta Hospital', 2: 'General Hospital'},
 'Hospital Address': {0: '1234 Street 429',
  1: '553 Alberta Road 441',
  2: '994 Random Street 923'},
 'Medicine_1': {0: 'Effective', 1: 'Effecive', 2: 'Normal'},
 'Medicine_2': {0: 'Effective', 1: 'Normal', 2: 'Effective'},
 'Medicine_3': {0: 'Normal', 1: 'Normal', 2: 'Normal'},
 'Medicine_4': {0: 'Effective', 1: 'Effective', 2: 'Effective'}})
Record          Hospital       Hospital Address Medicine_1 Medicine_2 Medicine_3 Medicine_4  
     1         Red Cross        1234 Street 429  Effective  Effective     Normal  Effective    
     2  Alberta Hospital   553 Alberta Road 441   Effecive     Normal     Normal  Effective
     3  General Hospital  994 Random Street 923     Normal  Effective     Normal  Effective
Output:
    Record          Hospital       Hospital Address        Name      Value
0        1         Red Cross        1234 Street 429  Medicine_1  Effective
1        2         Red Cross        1234 Street 429  Medicine_2  Effective
2        3         Red Cross        1234 Street 429  Medicine_3     Normal
3        4         Red Cross        1234 Street 429  Medicine_4  Effective
4        5  Alberta Hospital   553 Alberta Road 441  Medicine_1   Effecive
5        6  Alberta Hospital   553 Alberta Road 441  Medicine_2     Normal
6        7  Alberta Hospital   553 Alberta Road 441  Medicine_3     Normal
7        8  Alberta Hospital   553 Alberta Road 441  Medicine_4  Effective
8        9  General Hospital  994 Random Street 923  Medicine_1     Normal
9       10  General Hospital  994 Random Street 923  Medicine_2  Effective
10      11  General Hospital  994 Random Street 923  Medicine_3     Normal
11      12  General Hospital  994 Random Street 923  Medicine_4  Effective
Upon looking at PySpark examples, it is complicated: PySpark Dataframe melt columns into rows
And looking at Pandas example, it looks much easier. But there are many different Stack Overflow answers with some saying to use pivot, melt, stack, unstack, and more that it ends up being confusing.
So if anyone has an easy way to do this in PySpark, I am all ears. If not, I will happily take Pandas answers.
Thank you very much for your help!
 
     
     
    