I have a basic spark job that does a couple of joins. The 3 data frames that get joined are somewhat big, nearly 2 billion records in each of them. I have a spark infrastructure that automatically scales up nodes whenever necessary. It seems like a very simple spark SQL query whose results I write to disk. But the job always gets stuck at 99% when I look at from Spark UI.
Bunch of things I have tried are:
- Increase the number of 
executorsandexecutor memory. - Use 
repartitionwhile writing the file. - Use the native spark 
joininstead ofspark SQL joinetc 
However, none of these things have worked. It would be great if somebody can share the experience of solving this problem. Thanks in advance.