Note: this question can be associated with one existing question here. However, my question provides a more concrete example and has broader impact.
Consider we have a pandas data frame as following:
Questions cnt similarity
0 ABC 1 [1, 2, 3]
1 abc 2 [1, 2, 3]
2 cba 3 [2, 3, 1]
3 abcd 4 [4, 5, 6]
4 dcsa 5 [2, 3, 1]
5 adcd 6 [4, 5, 6]
6 abcd 7 [1, 2, 3]
7 cba 8 [7, 8, 9]
I have to add another column called cat based on the similarity column. If two rows have the same similarity, then categorize them as the same group. Below is the expected output. Any input is valuable. It is worth mentioning that the original dataset has 1M rows. Thank you.
Questions cnt similarity cat
0 ABC 1 [1, 2, 3] 1
1 abc 2 [1, 2, 3] 1
2 cba 3 [2, 3, 1] 2
3 abcd 4 [4, 5, 6] 3
4 dcsa 5 [2, 3, 1] 2
5 adcd 6 [4, 5, 6] 3
6 abcd 7 [1, 2, 3] 1
7 cba 8 [7, 8, 9] 4