To retrieve the supplementary information of primary keys, I came up with two ways of doing this.
One is to use GROUP BY with aggregation function [1] and the other is to use PARTITION BY and a window/analytic function (~ row_number) [2].
I am suspecting GROUP BY to be the lower performance way due to the aggregation function (might not be true since the PARTITION BY way uses subquery), however, I am not sure which would be the better way of appending supplementary columns.
- Which would be the better way in terms of time complexity and performance?
- Also, is there a better way of appending the auxiliary columns?
Example case: to retrieve the users' last login info (such as ip, country_code, ...)
Possible Solution 1: GROUP BY with an aggregation function
#standardSQL
SELECT
    user_id,  -- primary key
    ARRAY_AGG(
        STRUCT(
            ip, country_code, os, channel_type
        )
        ORDER BY login_timestamp DESC LIMIT 1
    )[OFFSET(0)].*
FROM user_login_info_table
GROUP BY user_id
Possible Solution 2: PARTITION BY with a subquery
#standardSQL
WITH
login_log AS (
    SELECT
        user_id,
        ip, country_code, os, channel_type,
        ROW_NUMBER() OVER (
            PARTITION BY user_id ORDER BY login_timestamp DESC
        ) AS rn
    FROM user_login_info_table
)
SELECT user_id, ip, country_code, os, channel_type
FROM login_log
WHERE rn = 1
c.f. The above example is an oversimplified example, which ignored other aggregation fields.
References
 
    