I have a very large Dataframe with 8000 columns and 50000 rows.
I want to write its statistics information into excel file.
I think we can use describe() method. But how to write it to excel in good format. Thanks
            Asked
            
        
        
            Active
            
        
            Viewed 2.1k times
        
    3
            
            
        - 
                    Excel can open a csv (comma-separated values) file as an ordinary spreadsheet. So the easiest thing is to just print any output as comma-separated values and then you can just open it with Excel. – Robert Dodier Apr 21 '17 at 17:03
- 
                    True, but best to convert it to a pandas dataframe first so you don't have to worry about part files – David Apr 21 '17 at 17:32
1 Answers
6
            The return type for describe is a pyspark dataframe. The easiest way to get the describe dataframe into an excel readable format is to convert it to a pandas dataframe and then write the pandas dataframe out as a csv file as below
import pandas
df.describe().toPandas().to_csv('fileOutput.csv')
If you want it in excel format, you can try below
import pandas
df.describe().toPandas().to_excel('fileOutput.xls', sheet_name = 'Sheet1', index = False)
Note, the above requires xlwt package to be installed (pip install xlwt in the command line)
 
    
    
        David
        
- 11,245
- 3
- 41
- 46
- 
                    Thanks for the reply, I had tried this. But the output in CSV file does not look much user friendly or readable. SO I wanted it in excel format. Thanks – Ajg Apr 21 '17 at 17:51
- 
                    
- 
                    
- 
                    1
- 
                    Can we use the above to write dataframes to multiple tabs in an excel sheet? – Bharath Feb 26 '18 at 16:37
- 
                    you'd have to refactor the code a bit. See https://stackoverflow.com/questions/14225676/save-list-of-dataframes-to-multisheet-excel-spreadsheet – David Feb 26 '18 at 19:05
