I tried df.orderBy("col1").show(10) but it sorted in ascending order. df.sort("col1").show(10) also sorts in ascending order. I looked on stackoverflow and the answers I found were all outdated or referred to RDDs. I'd like to use the native dataframe in spark.
            Asked
            
        
        
            Active
            
        
            Viewed 4e+01k times
        
    178
            
            
         
    
    
        Vedom
        
- 3,027
- 3
- 14
- 16
- 
                    3He means "df.sort("col1").show(10) also sorts in **ascending** order" – Josiah Yoder Jul 15 '16 at 18:30
- 
                    This solution worked perfectly for me : https://stackoverflow.com/a/38575271/5957143 – abc123 Nov 06 '18 at 03:12
6 Answers
256
            
            
        You can also sort the column by importing the spark sql functions
import org.apache.spark.sql.functions._
df.orderBy(asc("col1"))
Or
import org.apache.spark.sql.functions._
df.sort(desc("col1"))
importing sqlContext.implicits._
import sqlContext.implicits._
df.orderBy($"col1".desc)
Or
import sqlContext.implicits._
df.sort($"col1".desc)
 
    
    
        Gabber
        
- 7,169
- 3
- 32
- 46
- 
                    also when you're ordering ascending by all columns, the `asc` keyword is not necessary: `..orderBy("col1", "col2")`. – Dan Mar 04 '20 at 20:03
113
            It's in org.apache.spark.sql.DataFrame for sort method:
df.sort($"col1", $"col2".desc)
Note $ and .desc inside sort for the column to sort the results by.
- 
                    5`import org.apache.spark.sql.functions._` and `import sqlContext.implicits._` also get you a lot of nice functionality. – David Griffin May 19 '15 at 18:14
- 
                    6@Vedom: Shows a syntax error: `df.sort($"Time1", $"Time2".desc) SyntaxError: invalid syntax` at the $ symbol – kavya Sep 07 '16 at 07:28
- 
                    @kaks, need to import functions/implicits as described above to avoid that error – Rimer Nov 01 '17 at 14:01
75
            
            
        PySpark only
I came across this post when looking to do the same in PySpark. The easiest way is to just add the parameter ascending=False:
df.orderBy("col1", ascending=False).show(10)
Reference: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy
 
    
    
        Nic Scozzaro
        
- 6,651
- 3
- 42
- 46
- 
                    12The question is marked with a scala tag, but this answer is for python only as this syntax as well as a function signature are python-only. – Viacheslav Rodionov Feb 01 '19 at 11:51
21
            
            
        import org.apache.spark.sql.functions.desc
df.orderBy(desc("columnname1"),desc("columnname2"),asc("columnname3"))
 
    
    
        Paul Reiners
        
- 8,576
- 33
- 117
- 202
 
    
    
        Nitya Yekkirala
        
- 265
- 3
- 3
- 
                    1This is a duplicate answer from the one 3 years earlier by @AmitDubey. should be removed in favor of that one. – WestCoastProjects Jun 30 '19 at 14:38
3
            
            
        In the case of Java:
If we use DataFrames, while applying joins (here Inner join), we can sort (in ASC) after selecting distinct elements in each DF as:
Dataset<Row> d1 = e_data.distinct().join(s_data.distinct(), "e_id").orderBy("salary");
where e_id is the column on which join is applied while sorted by salary in ASC.
Also, we can use Spark SQL as:
SQLContext sqlCtx = spark.sqlContext();
sqlCtx.sql("select * from global_temp.salary order by salary desc").show();
where
- spark -> SparkSession
- salary -> GlobalTemp View.
 
     
     
     
    