I am trying to run 2 functions doing completely independent transformations on a single RDD in parallel using PySpark. What are some methods to do the same?
def doXTransforms(sampleRDD):
    (X transforms)
def doYTransforms(sampleRDD):
    (Y Transforms)
if __name__ == "__main__":
    sc = SparkContext(appName="parallelTransforms")
    sqlContext = SQLContext(sc)
    hive_context = HiveContext(sc)
    rows_rdd = hive_context.sql("select * from tables.X_table")
    p1 = Process(target=doXTransforms , args=(rows_rdd,))
    p1.start()
    p2 = Process(target=doYTransforms, args=(rows_rdd,))  
    p2.start()
    p1.join()
    p2.join()
    sc.stop()
This does not work and I now understand this will not work. But is there any alternative way to make this work? Specifically are there any python-spark specific solutions?
 
     
     
    