How to sort by column in descending order in Spark SQL?

Question

I tried df.orderBy("col1").show(10) but it sorted in ascending order. df.sort("col1").show(10) also sorts in ascending order. I looked on stackoverflow and the answers I found were all outdated or referred to RDDs. I'd like to use the native dataframe in spark.

He means "df.sort("col1").show(10) also sorts in **ascending** order" — Josiah Yoder, Jul 15 '16 at 18:30
This solution worked perfectly for me : https://stackoverflow.com/a/38575271/5957143 — abc123, Nov 06 '18 at 03:12

Gabber · Answer 1 · 2017-04-07T10:21:36.010

256

You can also sort the column by importing the spark sql functions

import org.apache.spark.sql.functions._
df.orderBy(asc("col1"))

Or

import org.apache.spark.sql.functions._
df.sort(desc("col1"))

importing sqlContext.implicits._

import sqlContext.implicits._
df.orderBy($"col1".desc)

Or

import sqlContext.implicits._
df.sort($"col1".desc)

edited Apr 07 '17 at 10:21

answered Aug 17 '15 at 14:23

Gabber

7,169
3
32
46

also when you're ordering ascending by all columns, the `asc` keyword is not necessary: `..orderBy("col1", "col2")`. – Dan Mar 04 '20 at 20:03

score 113 · Accepted Answer · edited Jul 17 '19 at 15:58

113

It's in org.apache.spark.sql.DataFrame for sort method:

df.sort($"col1", $"col2".desc)

Note $ and .desc inside sort for the column to sort the results by.

edited Jul 17 '19 at 15:58

Sky

2,509
1
19
28

answered May 19 '15 at 17:48

Vedom

3,027
3
14
16

5

`import org.apache.spark.sql.functions._` and `import sqlContext.implicits._` also get you a lot of nice functionality. – David Griffin May 19 '15 at 18:14
6

@Vedom: Shows a syntax error: `df.sort($"Time1", $"Time2".desc) SyntaxError: invalid syntax` at the $ symbol – kavya Sep 07 '16 at 07:28
@kaks, need to import functions/implicits as described above to avoid that error – Rimer Nov 01 '17 at 14:01

Nic Scozzaro · Answer 3 · 2019-03-24T13:55:01.093

75

PySpark only

I came across this post when looking to do the same in PySpark. The easiest way is to just add the parameter ascending=False:

df.orderBy("col1", ascending=False).show(10)

Reference: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy

edited Mar 24 '19 at 13:55

answered Nov 11 '17 at 15:22

Nic Scozzaro

6,651
3
42
46

12

The question is marked with a scala tag, but this answer is for python only as this syntax as well as a function signature are python-only. – Viacheslav Rodionov Feb 01 '19 at 11:51

score 21 · Answer 4 · edited Dec 11 '18 at 18:33

21

import org.apache.spark.sql.functions.desc

df.orderBy(desc("columnname1"),desc("columnname2"),asc("columnname3"))

edited Dec 11 '18 at 18:33

Paul Reiners

8,576
33
117
202

answered Sep 11 '18 at 12:31

Nitya Yekkirala

265
3
3

1

This is a duplicate answer from the one 3 years earlier by @AmitDubey. should be removed in favor of that one. – WestCoastProjects Jun 30 '19 at 14:38

score 8 · Answer 5 · edited May 14 '18 at 16:22

8

df.sort($"ColumnName".desc).show()

edited May 14 '18 at 16:22

OneCricketeer

179,855
19
132
245

answered Nov 09 '17 at 10:38

Nilesh Shinde

457
5
10

score 3 · Answer 6 · edited Sep 06 '18 at 21:10

In the case of Java:

If we use DataFrames, while applying joins (here Inner join), we can sort (in ASC) after selecting distinct elements in each DF as:

Dataset<Row> d1 = e_data.distinct().join(s_data.distinct(), "e_id").orderBy("salary");

where e_id is the column on which join is applied while sorted by salary in ASC.

Also, we can use Spark SQL as:

SQLContext sqlCtx = spark.sqlContext();
sqlCtx.sql("select * from global_temp.salary order by salary desc").show();

where

spark -> SparkSession
salary -> GlobalTemp View.

How to sort by column in descending order in Spark SQL?

6 Answers6

Linked

Related