I'm using scala spark and have a DataFrame:
Source | Column1 | Column2
A         ...        ...
B         ...        ...
B         ...        ...
C         ...        ...
B         ...        ...
C         ...        ...
A         ...        ...
I was looking into partitionBy (https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/sql/DataFrameWriter.html) but I have a specific requirement where I have to save each partition to a separate directory. Ideally, it would look like this:
df.write.partitionBy("source").saveAsTable($"{CURRENT_SOURCE_VALUE}")
Is it possible to accomplish this using partitionBy or should try doing something else like looping over each row using rdd , or possibly groupBy? etc. any pointers would be helpful I'm fairly new to apache spark. Something like this (https://stackoverflow.com/a/43998102) but I don't think it's possible in Apache Spark Scala.
EDIT
The location (path) for each source will come from a separate map like so:
var sourceLocation = Map[String, String] //where the key is the source name (A, B, C) and the value is the path (/MyCustomPathForA/.../, /MyCustomPathForB/.../, etc.) where each base path (root) could be different.
 
    