After reading through the documentation I do not understand how does Spark running on YARN account for Python memory consumption.
Does it count towards spark.executor.memory, spark.executor.memoryOverhead or where?
In particular I have a PySpark application with spark.executor.memory=25G, spark.executor.cores=4 and I encounter frequent Container killed by YARN for exceeding memory limits. errors when running a map on an RDD. It operates on a fairly large amount of complex Python objects so it is expected to take up some non-trivial amount of memory but not 25GB. How should I configure the different memory variables for use with heavy Python code?
