All the methods collected with examples
Intro
Actually, there are many ways to do it.
Some are harder from others, but it is up to you which one suits you best. I will try to showcase them all.
#1 Programatically in your app
Seems to be the easiest, but you will need to recompile your app to change those settings. Personally, I don't like it but it works fine.
Example:
import org.apache.log4j.{Level, Logger}
val rootLogger = Logger.getRootLogger()
rootLogger.setLevel(Level.ERROR)
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.spark-project").setLevel(Level.WARN)
You can achieve much more just using log4j API.
Source: [Log4J Configuration Docs, Configuration section]
#2 Pass log4j.properties during spark-submit
This one is very tricky, but not impossible. And my favorite.
Log4J during app startup is always looking for and loading log4j.properties file from classpath.
However, when using spark-submit Spark Cluster's classpath has precedence over app's classpath! This is why putting this file in your fat-jar will not override the cluster's settings!
Add -Dlog4j.configuration=<location of configuration file> to
spark.driver.extraJavaOptions (for the driver) or
spark.executor.extraJavaOptions (for executors).
Note that if using a
file, the file: protocol should be explicitly provided, and the file
needs to exist locally on all the nodes.
To satisfy the last condition, you can either upload the file to the location available for the nodes (like hdfs) or access it locally with driver if using deploy-mode client. Otherwise:
upload a custom log4j.properties using spark-submit, by adding it to
the --files list of files to be uploaded with the application.
Source: Spark docs, Debugging
Steps:
Example log4j.properties:
# Blacklist all to warn level
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Whitelist our app to info :)
log4j.logger.com.github.atais=INFO
Executing spark-submit, for cluster mode:
spark-submit \
    --master yarn \
    --deploy-mode cluster \
    --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" \
    --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" \
    --files "/absolute/path/to/your/log4j.properties" \
    --class com.github.atais.Main \
    "SparkApp.jar"
Note that you must use --driver-java-options if using client mode. Spark docs, Runtime env
Executing spark-submit, for client mode:
spark-submit \
    --master yarn \
    --deploy-mode client \
    --driver-java-options "-Dlog4j.configuration=file:/absolute/path/to/your/log4j.properties" \
    --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" \
    --files "/absolute/path/to/your/log4j.properties" \
    --class com.github.atais.Main \
    "SparkApp.jar"
Notes:
- Files uploaded to spark-clusterwith--fileswill be available at root dir, so there is no need to add any path infile:log4j.properties.
- Files listed in --filesmust be provided with absolute path!
- file:prefix in configuration URI is mandatory.
#3 Edit cluster's conf/log4j.properties
This changes global logging configuration file.
update the $SPARK_CONF_DIR/log4j.properties file and it will be
automatically uploaded along with the other configurations.
Source: Spark docs, Debugging
To find your SPARK_CONF_DIR you can use spark-shell:
atais@cluster:~$ spark-shell 
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.1
      /_/   
scala> System.getenv("SPARK_CONF_DIR")
res0: String = /var/lib/spark/latest/conf
Now just edit /var/lib/spark/latest/conf/log4j.properties (with example from method #2) and all your apps will share this configuration.
#4 Override configuration directory
If you like the solution #3, but want to customize it per application, you can actually copy conf folder, edit it contents and specify as the root configuration during spark-submit.
To specify a different configuration directory other than the default “SPARK_HOME/conf”, you can set SPARK_CONF_DIR. Spark will use the configuration files (spark-defaults.conf, spark-env.sh, log4j.properties, etc) from this directory.
Source: Spark docs, Configuration
Steps:
- Copy cluster's - conffolder (more info, method #3)
 
- Edit - log4j.propertiesin that folder (example in method #2)
 
- Set - SPARK_CONF_DIRto this folder, before executing- spark-submit,
 example:
 - export SPARK_CONF_DIR=/absolute/path/to/custom/conf
spark-submit \
    --master yarn \
    --deploy-mode cluster \
    --class com.github.atais.Main \
    "SparkApp.jar"
 
Conclusion
I am not sure if there is any other method, but I hope this covers the topic from A to Z. If not, feel free to ping me in the comments!
Enjoy your way!