This is my pandas DataFrame with original column names.
old_dt_cm1_tt   old_dm_cm1   old_rr_cm2_epf   old_gt
1               3            0                0
2               1            1                5
- Firstly I want to extract all unique variations of cm, e.g. in this casecm1andcm2.
- After this I want to create a new column per each unique cm. In this example there should be 2 new columns.
- Finally in each new column I should store the total count of non-zero original column values, i.e.
old_dt_cm1_tt old_dm_cm1 old_rr_cm2_epf old_gt cm1 cm2 1 3 0 0 2 0 2 1 1 5 2 1
I implemented the first step as follows:
cols = pd.DataFrame(list(df.columns))
ind = [c for c in df.columns if 'cm' in c]
df.ix[:, ind].columns
How to proceed with steps 2 and 3, so that the solution is automatic (I don't want to manually define column names cm1 and cm2, because in original data set I might have many cm variations.
 
     
     
    