I want to remove consecutive duplicates in an array when using hive.
collect_list() keeps all duplicates, while collect_set() only keeps distinct entries. I kind of need something in the middle ground.
For example, from the below table:
id  |  number
==============
fk        4
fk        4
fk        2
4f        1
4f        8
4f        8
h9        7
h9        4
h9        7
I would like to get something like this:
id | aggregate
===========================
fk   Array<int>(4,2)
4f   Array<int>(1,8)
h9   Array<int>(7,4,7)