I'm a newbie on Spark and need to parallelizePairs() (working on Java).
First, I've started my driver with:
SparkSession spark = SparkSession
.builder()
.appName("My App")
.config("driver", "org.postgresql.Driver")
.getOrCreate();
But spark don't have the function I need. Just parallelize() thru spark.sparkContext()
Now I'm tempted to add
SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("My App");
JavaSparkContext context = new JavaSparkContext(sparkConf);
This way, context have the function I need but I'm very confusing here.
First, I never needed JavaSparkContext because I'm running using spark-submit and setting the master address there.
Second, why spark.sparkContext() is not the same of JavaSparkContext and how to get it using the SparkSession?
If I'm passing the master in command line, must I set sparkConf.setMaster( '<master-address-again>' )?
I already read this: How to create SparkSession from existing SparkContext and undesrtood the problem but I realy need the builder way because I need to pass the .config("driver", "org.postgresql.Driver") to it.
Please some light here...
EDIT
Dataset<Row> graphDatabaseTable = spark.read()
.format("jdbc")
.option("url", "jdbc:postgresql://192.168.25.103:5432/graphx")
.option("dbtable", "public.select_graphs")
.option("user", "postgres")
.option("password", "admin")
.option("driver", "org.postgresql.Driver")
.load();
SQLContext graphDatabaseContext = graphDatabaseTable.sqlContext();
graphDatabaseTable.createOrReplaceTempView("select_graphs");
String sql = "select * from select_graphs where parameter_id = " + indexParameter;
Dataset<Row> graphs = graphDatabaseContext.sql(sql);