I have a simple table with 2 columns: UserID and Category, and each UserID can repeat with a few categories, like so:
UserID   Category
------   --------
1         A
1         B
2         C
3         A
3         C
3         B
I want to "dummify" this table: i.e. to create an output table that has a unique column for each Category consisting of dummy variables (0/1 depending on whether the UserID belongs to that particular Category):
UserID    A  B  C
------    -- -- --
1         1  1  0
2         0  0  1
3         1  1  1
My problem is that I have THOUSANDS of categories (not just 3 as in this example) and so this cannot be efficiently accomplished using CASE WHEN statement.
So my questions are:
1) Is there a way to "dummify" the Category column in Google BigQuery without using thousands of CASE WHEN statements.
2) Is this a situation where the UDF functionality works well? It seems like it would be the case but I am not familiar enough with UDF in BigQuery to solve this problem. Would someone be able to help out?
Thanks.
 
    