everyone. This problem has already been asked by others. Splitting dictionary/list inside a Pandas Column into Separate Columns
I have already asked this question. But it doesn't be resolved. How to use pandas to build a column which are in a dataframe
Now, I have a dataframe. It looks like this.
                     intron_id                                            octamer
0       >ENSG00000183943.1  AGCCATGC:1 AGUAGCUG:1 GCCUGGCC:1 AGAUGAUG:1 AG...
1       >ENSG00000183943.2  CATATTTC:1 UCCCAAAA:1 AAGCCATA:1 TATTTTGC:1 TA...
2       >ENSG00000183943.3  AGUAGCUG:4 UCAACAGG:1 CCUUUCAU:1 UACCUUUU:1 GC...
3       >ENSG00000183943.4  AUGAGCAC:1 UCCUACGG:1 GGAGGATC:1 AUAGGGUG:1 CC...
4       >ENSG00000183943.5  UUGCCAAU:1 AUGCUGGG:1 ACUAUUUU:1 GGAGGATC:3 UG...
Now, I want to transform it as this.
    intron_id   AGCCATGA AGUAGCUG  GCCUGGCC ......
>ENSG00000183943.1  1      1         1
>ENSG00000183943.2  0      0        0
>ENSG00000183943.3   0      0         0
But when I tried to use apply(pd.Series) or df.octamer.values.tolist() , both of them don't work. I am confused. Hope you can give me some advices. Thank you in advance. My code is as follows.
    import pandas as pd
df=pd.read_csv('~/10genomic/elife/octamer/intron_seq/count.txt',delimiter='\t',header=None)
df.rename(columns={0:"intron_id",1:"octamer"},inplace=True)
df['octamer']=df['octamer'].apply(lambda x:str(x))
print(df)
                 intron_id                                            octamer
0       >ENSG00000183943.1  AGCCATGC:1 AGUAGCUG:1 GCCUGGCC:1 AGAUGAUG:1 AG...
1       >ENSG00000183943.2  CATATTTC:1 UCCCAAAA:1 AAGCCATA:1 TATTTTGC:1 TA...
2       >ENSG00000183943.3  AGUAGCUG:4 UCAACAGG:1 CCUUUCAU:1 UACCUUUU:1 GC...
3       >ENSG00000183943.4  AUGAGCAC:1 UCCUACGG:1 GGAGGATC:1 AUAGGGUG:1 CC...
4       >ENSG00000183943.5  UUGCCAAU:1 AUGCUGGG:1 ACUAUUUU:1 GGAGGATC:3 UG...
df.drop(labels=[2370,3967,5728,11875,14464],axis=0,inplace=True)
def builddict(x):
    dictls=[]
    for item in x.split(" "):
        dictls.append(item.split(":"))
    return(dict(dictls))
df['octamer']=df['octamer'].apply(builddict)
print(df)
                intron_id                                            octamer
0       >ENSG00000183943.1  {'AGCCATGC': '1', 'AGUAGCUG': '1', 'GCCUGGCC':...
1       >ENSG00000183943.2  {'CATATTTC': '1', 'UCCCAAAA': '1', 'AAGCCATA':...
2       >ENSG00000183943.3  {'AGUAGCUG': '4', 'UCAACAGG': '1', 'CCUUUCAU':...
3       >ENSG00000183943.4  {'AUGAGCAC': '1', 'UCCUACGG': '1', 'GGAGGATC':...
4       >ENSG00000183943.5  {'UUGCCAAU': '1', 'AUGCUGGG': '1', 'ACUAUUUU':...
print(df['octamer'].apply(pd.Series))
                                                      0
0      {'AGCCATGC': '1', 'AGUAGCUG': '1', 'GCCUGGCC':...
1      {'CATATTTC': '1', 'UCCCAAAA': '1', 'AAGCCATA':...
2      {'AGUAGCUG': '4', 'UCAACAGG': '1', 'CCUUUCAU':...
3      {'AUGAGCAC': '1', 'UCCUACGG': '1', 'GGAGGATC':...
4      {'UUGCCAAU': '1', 'AUGCUGGG': '1', 'ACUAUUUU':...
When I tried to solve it as follow, it produced this wrong. I really confuesd.
    df=pd.read_csv('~/10genomic/elife/octamer/intron_seq/countdict.txt',delimiter=',',index_col=0)
df=df.iloc[:3,:]
print(df)
            intron_id                                            octamer
0  >ENSG00000183943.1  {'AGCCATGC': '1', 'AGUAGCUG': '1', 'GCCUGGCC':...
1  >ENSG00000183943.2  {'CATATTTC': '1', 'UCCCAAAA': '1', 'AAGCCATA':...
2  >ENSG00000183943.3  {'AGUAGCUG': '4', 'UCAACAGG': '1', 'CCUUUCAU':...
temp_df=pd.DataFrame.from_records(df.pop("octamer"))
print(temp_df)
0     1     2     3     4     5      ... 73895 73896 73897 73898 73899 73900
0     {     '     A     G     C     C  ...  None  None  None  None  None  None
1     {     '     C     A     T     A  ...  None  None  None  None  None  None
2     {     '     A     G     U     A  ...     :           '     1     '     }
 
     
    


