I have large rdd and I want to create 4 different rdd's out of that based on list of headers provided and save it in impala table by creating 4 parquest files.
like this:
a    b    c   d   e   f   g    h
--------------------------------
abc  1   3   4   5   7   9    11
xyz  2   5   7   4   9   4    12
I have list of columns for impala side tables:
table 1 impala side :- a,b,c 
table 2 impala side :- d, e, f
...
Also need to add new column for each table for user defined primary key like:
table 1 impala side : - id, a, b, c
Tried with rdd.map function but how to apply for a specific list:
rdd_1 = rdd.map(lambda x: (x['a'],x['b],x['c']))
Also how to add new column with different primary keys ?
 
    