How can I drop the row values in Pyspark based on the value of row number/row index value?
I am new to Pyspark (and coding) -- I have tried coding something but it is not working.
How can I drop the row values in Pyspark based on the value of row number/row index value?
I am new to Pyspark (and coding) -- I have tried coding something but it is not working.
You can't drop specific cols, but you can just filter the ones you want, by using filter or its alias, where.
Imagine you want "to drop" the rows where the age of a person is lower than 3. You can just keep the opposite rows, like this:
df.filter(df.age >= 3)
 
    
    import pyspark.sql.functions as F
schema1 = StructType([StructField('rownumber', IntegerType(), True),StructField('name', StringType(), True)])
data1 = [(1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')]
df1 = spark.createDataFrame(data1, schema1)
df1.show()
+---------+----+
|rownumber|name|
+---------+----+
|        1|   a|
|        2|   b|
|        3|   c|
|        4|   d|
|        5|   e|
+---------+----+
df1.filter(F.col("rownumber").between(2,4)).show()
+---------+----+
|rownumber|name|
+---------+----+
|        2|   b|
|        3|   c|
|        4|   d|
+---------+----+
