i have a two data sets names dataset1 and dataset2 and dataset1 is like
empid empame
101 john
102 kevin
and dataset2 is like
empid empmarks empaddress
101 75 LA
102 69 NY
The dataset2 will be very huge and i need to process some operations on these two datasets and need to get results from above two datasets.
As of my knowledge, now i have two options to process these datasets:
1.Store dataset1(which is lesser in size) as hive lookup table and have to process them through Spark
2.By using Spark Broadcast Variables we can process these dataset.
Anyone please suggest me which one is the better option.