I would like to select the item that has the greatest value. For exemple in this table I would like to select MAC09
| Identifiant | Val |
|---|---|
| MAC26 | 36 |
| MAC10 | 9 |
| MAC02 | 2 |
| MAC32 | 11 |
| MAC09 | 37 |
| MAC28 | 10 |
there are several way of doing it, here is a solution using a rank
from pyspark.sql import functions as F, Window
df.withColumn("rnk", F.rank().over(Window.orderBy(F.col("Val").desc()))).where(
"rnk = 1"
).drop("rnk").show()
+-----------+---+
|Identifiant|Val|
+-----------+---+
| MAC09| 37|
+-----------+---+