I am trying to concat multiple columns in spark using concat function.
For example below is the table for which I have to add new concatenated column
table - **t**
+---+----+  
| id|name|
+---+----+  
|  1|   a|  
|  2|   b|
+---+----+
and below is the table which has the information about which columns are to be concatenated for given id (for id 1 column id and name needs to be concatenated and for id 2 only id)
table - **r**
+---+-------+
| id|   att |
+---+-------+
|  1|id,name|
|  2|   id  |
+---+-------+
if I join the two tables and do something like below, I am able to concat but not based on the table r (as the new column is having 1,a for first row but for second row it should be 2 only)
t.withColumn("new",concat_ws(",",t.select("att").first.mkString.split(",").map(c => col(c)): _*)).show
+---+----+-------+---+
| id|name|  att  |new|
+---+----+-------+---+
|  1|   a|id,name|1,a|
|  2|   b|  id   |2,b|
+---+----+-------+---+
I have to apply filter before the select in the above query, but I am not sure how to do that in withColumn for each row.
Something like below, if that is possible.
t.withColumn("new",concat_ws(",",t.**filter**("id="+this.id).select("att").first.mkString.split(",").map(c => col(c)): _*)).show
As it will require to filter each row based on the id.
scala> t.filter("id=1").select("att").first.mkString.split(",").map(c => col(c))
res90: Array[org.apache.spark.sql.Column] = Array(id, name)
scala> t.filter("id=2").select("att").first.mkString.split(",").map(c => col(c))
res89: Array[org.apache.spark.sql.Column] = Array(id)
Below is the final required result.
+---+----+-------+---+
| id|name|  att  |new|
+---+----+-------+---+
|  1|   a|id,name|1,a|
|  2|   b|  id   |2  |
+---+----+-------+---+