After specifying a configuration file in a spark-submit as in this answer:
spark-submit \
    --master local \
    --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties"\
    --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties"\
    --py-files ./dist/src-1.0-py3-none-any.whl\
    --files "/job/log4j.properties"\ # path in docker container
     main.py -input $1 -output $2 -mapper $3 $4 # app args
With the dockerized application structure being:
job/
|--  entrypoint.sh
|--  log4j.properties
|--  main.py
I'm getting the following error:
log4j:ERROR Ignoring configuration file [file:/log4j.properties].log4j:ERROR Could not read configuration file from URL [file:/log4j.properties].
java.io.FileNotFoundException: /log4j.properties (No such file or directory)
It works fine if I set the configuration from the spark context method: PropertyConfigurator.configure:
logger = sc._jvm.org.apache.log4j.Logger
sc._jvm.org.apache.log4j.PropertyConfigurator.configure("/job/log4j.properties")
Logger = logger.getLogger("MyLogger")
That is, all spark INFO level logging is silenced, and I only see warnings and error logs, which is what I've set in the configuration file. However, if I just instanciate a logger as (desirable behaviour):
log4jLogger = sc._jvm.org.apache.log4j
logger = log4jLogger.LogManager.getLogger("MyLogger")
It isn't behaving as at it does setting it via PropertyConfigurator.configure, which I've set to silence all spark INFO level loggings. Any idea on how to use the logging configuration set in the spark-submit to control the application's logs?
Using pyspark with a spark version 3.0.1 and python 3.8.0.
 
    