I have been using PySpark and have a problem with the logging. Logs from the Spark module are piped to STDOUT and I have no control over that from Python.
For example, logs such as this one are being piped to STDOUT instead of STDERR:
2018-03-12 09:50:10 WARN Utils:66 - Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
Spark is not installed in the environment, only Python and Pyspark.
How do I:
A. Redirect all logs to STDERR
OR
B. If that is not possible, disable the logs.
Things I have tried:
- I have tried to use the
pyspark.SparkConf()but nothing I configure there seems to work. - I have tried creating
SparkEnv.confand setting theSPARK_CONF_DIRto match just to check if I could at least disable the example log above, to no avail. - I have tried looking at the documentation but no indication of how to accomplish what I am trying.