The title is a little wordy. I will try to explain what I want to solve--at this point I am quite lost so I really am looking for the best approach on how to solve this problem. I have a dataframe with data that looks something like this:
| id_not_unique | datetime | binary_var | 
|---|---|---|
| 111111111 | 2020-08-26 15:44:58 | 1 | 
| 111111111 | 2020-08-28 15:33:45 | 1 | 
| 222222222 | 2020-07-12 11:21:09 | 1 | 
| 222222222 | 2019-04-21 14:22:42 | 0 | 
I would like to create another column, let's call it 'identifier', that essentially, grouped by id_not_unique, gives a unique identifier based on binary_var. So for each unique id_not_unique, I would like datetime to be ordered in ascending order--this I can do. But then, for some pattern of binary_var being like 1, or [0, 1], the 'identifier' value would be unique. The resulting dataset should look something like:
| id_not_unique | datetime | binary_var | identifier | 
|---|---|---|---|
| 111111111 | 2020-08-26 15:44:58 | 1 | x_1 | 
| 111111111 | 2020-08-28 15:33:45 | 1 | x_2 | 
| 222222222 | 2019-04-21 14:22:42 | 0 | x_3 | 
| 222222222 | 2020-07-12 11:21:09 | 1 | x_3 | 
The binary_var sequence could be any of [1], [0, 1], [0, 0, 1], essentially any number of 0's followed by a 1. Currently, I am grouping the dataframe by id_not_unique and then after ordering datetime, I am iterating through binary_var and checking what value it is and what the value it is at index-1 to determine if the identifier should be new or the same as the identifier at index-1. But this is slow and my rows are at around 3 million. So is there another better method to approach this?