I'm trying to create a flag in my dataset based on 2 conditions, the first is simple. Does CheckingCol = CheckingCol2.
The second is more complicated. I have a column called TranID and a column called RevID.
For nay row if RevID is in TranID AND CheckingCol = CheckingCol2 then the flag should return "Yes". Otherwise the flag should return "No".
My data looks like this:
TranID   RevID   CheckingCol  CheckingCol2
1        2       ABC          ABC
2        1       ABC          ABC
3        6       ABCDE        ABCDE
4        3       ABCDE        ABC
5        7       ABCDE        ABC
The expected result would be:
TranID   RevID   CheckingCol  CheckingCol2  Flag
1        2       ABC          ABC           Yes
2        1       ABC          ABC           Yes
3        6       ABCDE        ABCDE         No
4        3       ABCDE        ABC           No
5        7       ABCDE        ABC           No
I've tried using:
df.withColumn("TotalMatch", when((col("RevID").contains(col("TranID"))) & (col("CheckingColumn") == col("CheckingColumn2")), "Yes").otherwise("No"))
But it didn't work, and I've not been able to find anything online about how to do this.
Any help would be great!
 
    