I am learning Hadoop environment and sorry if these are such silly questions!
I stored data(Kaggle Outbrain click prediction) to HIVE, and I used RDD.
Then I want to use Zeppelin spark2.pyspark. to use python functions.
When I call data with %jdbc(hive) I can see it.
My questions are;
How can I make a dataframe to play on the zeppelin or Do I have to create a dataframe?
How can I start python analysis part? If I make any changing will affect HIVE data?