I am running a spark analytics application and reading MSSQL Server table (whole table) directly using spark jdbc. Thes table have more than 30M records but don't have any primary key column or integer column. Since the table don't have such column I cannot use the partitionColumn, hence it is taking too much time while reading the table.
val datasource = spark.read.format("jdbc")
                .option("url", "jdbc:sqlserver://host:1433;database=mydb")
                .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
                .option("dbtable", "dbo.table")
                .option("user", "myuser")
                .option("password", "password")
                .option("useSSL", "false").load()
Is there any way to improve the performance is such case and use the parallelism while reading data from relational database sources ( The source could be Oracle, MSSQL Server, MySQL, DB2).
 
    