Can someone explain the differences between --packages and --jars in a spark-submit script?
nohup ./bin/spark-submit --jars ./xxx/extrajars/stanford-corenlp-3.8.0.jar,./xxx/extrajars/stanford-parser-3.8.0.jar \
--packages datastax:spark-cassandra-connector_2.11:2.0.7 \
--class xxx.mlserver.Application \
--conf spark.cassandra.connection.host=192.168.0.33 \
--conf spark.cores.max=4 \
--master spark://192.168.0.141:7077 ./xxx/xxxanalysis-mlserver-0.1.0.jar 1000 > ./logs/nohup.out &
Also, do I require the--packages configuration if the dependency is in my applications pom.xml? (I ask because I just blew up my applicationon by changing the version in --packages while forgetting to change it in the pom.xml)
I am using the --jars currently because the jars are massive (over 100GB) and thus slow down the shaded jar compilation. I admit I am not sure why I am using --packages other than because I am following datastax documentation