As per Spark docs, spark.sql.shuffle.partitions parameter Configures the number of partitions to use when shuffling data for joins or aggregations.. To control the number of output files use the repartition() method before writing the output. So something like this:
df
.filter(...) // some transformations
.join(...)
.repartition(1) // move data into a single partition
.write
.format(...)
.save(...)
The snippet above would result in a single output file.
You are not limited to repartitioning your data once - you can repartition as much as you need, but bare in mind that this is a costly operation:
df
.filter(...) // some transformations
.repartition(...) // repartition to improve join performance
.join(...)
.repartition(1) // move data into a single partition
.write
.format(...)
.save(...)
If you want a good explanation of how repartition works, here is a great answer:
Spark - repartition() vs coalesce()
For more information on how to improve the performance of the joins, refer to the Spark docs:
https://spark.apache.org/docs/latest/sql-performance-tuning.html#join-strategy-hints-for-sql-queries