I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns. How do I select this columns without having to manually type the names of all the columns I want to select?
            Asked
            
        
        
            Active
            
        
            Viewed 6.1k times
        
    33
            
            
        - 
                    use `drop` with columns you'd like to exclude. – Vamsi Prabhala Jun 13 '18 at 13:14
- 
                    3`df.select([c for c in df.columns if c not in {'GpuName','GPU1_TwoPartHwID'}])` – vvg Jun 13 '18 at 14:18
- 
                    2Possible duplicate of [How to exclude multiple columns in Spark dataframe in Python](https://stackoverflow.com/questions/35674490/how-to-exclude-multiple-columns-in-spark-dataframe-in-python) – vvg Jun 13 '18 at 14:18
3 Answers
2
            
            
        this might be helpful
df_cols = list(set(df.columns) - {'<col1>','<col2>',....})
df.select(df_cols).show()
 
    
    
        sairamdgr8
        
- 47
- 1
- 10
1
            
            
        df.drop(*[cols for cols in [list of columns to drop]])
Useful if the list to drop columns is huge. or if the list can be derived programmatically.
 
    
    
        martand
        
- 13
- 4
