Here's what my dataframe looks like:
p3.show(false)
CLASS_NAME          ID    CREATED_BY
/SC/ABC/123/abc     123    david
/SC/DEF/456/ghi     456    hannah
... more rows to follow
what I'd like to do is: split CLASS_NAME column and concatenate the first two string parts of it and form a new column: CLIENT_ID and append to the dataframe, desired output is below:
CLASS_NAME          ID    CREATED_BY  CLIENT_ID
/SC/ABC/123/abc     xyz    david      /SC/ABC
/SC/DEF/456/ghi     jfk    hannah     /SC/DEF
... more rows to follow
So far, I was able to follow this answer and split them using below command:
import org.apache.spark.sql.functions.split
import spark.implicits._
val p4 = p3.withColumn("CLIENT_ID", split($"CLASS_NAME", "\\/")).select(
  $"CLIENT_ID".getItem(1).as("col1"),
  $"CLIENT_ID".getItem(2).as("col2"),
  $"CLIENT_ID".getItem(3).as("col3")
)
p4.show(false)
col1 col2 col3
SC   ABC  123
SC   DEF  456
... more rows to follow
But I haven't figured out how to 1. concatenate the two strings; 2. append this new column to the original dataframe.
Any ideas would be greatly appreciated!