I have the following data:
col_1 | col_2 | col_3 | col_4
-----------------------------
a1      b1      c1      d1
a1      b2      c1      d1
a1      b3      c1      d1
a1      b4      c1      d2
a1      b5      c2      d2
a1      b6      c2      d2
a1      b7      c1      d3
a1      b8      c2      d3
a1      b9      c3      d3
a1      b10     c1      d2
a1      b11     c2      d3
a2      b12     c1      d1
a3      b13     c1      d1
I am interested in being able to:
- Return rows where the value for col_1is unique
- For each row in the result, it should return the values for the columnns that have the highest counts when grouping by: col_3,col_4
For example, I would like the output to return the following:
col_1 | col_2 | col_3 | col_4
-----------------------------
a1      b1      c1      d1
a2      b12     c1      d1
a3      b13     c1      d1
Notice in the result that each value in col_1 is unique. Also note that for a1, it returned with c1 and d1 as they had the highest group by counts for a1.
How can I achieve this by SQL query? I will be using it for a Hive SQL query.
 
     
     
    