I have the following piece of code to save a file on S3
rdd
        //drop header
        .mapPartitionsWithIndex { (idx, iter) => if (idx == 0) iter.drop(1) else iter }
        //assign the Key for PartitionBy
        //if the Key doesnt exist assign -1 which means all the data goes to part-00000 File
        .map(line => if (colIndex == -1) (null, line) else (line.split(TILDE)(colIndex), line))
        .partitionBy(customPartitioner)
        .map { case (_, line) => line }
        //Add Empty columns and Change the order and get the modified string
        .map(line => addEmptyColumns(line, schemaIndexArray))            
        .saveAsTextFile(s"s3a://$bucketName/$serviceName/$folderPath")
For HDFS , there is no S3 path and the code takes 1/5th of time. Any other approaches on how to fix this?. I am setting the hadoop configuration in spark.