If I have a dataframe like the following:
df = pd.DataFrame({'val':['a','b','c','d','e','f','g','h'],
                   'cat':['C','D','D','C','D','D','D','C'],
                   'num':[1,2,2,1,2,2,2,1],
                   'cat2':['X','Y','Y','X','Y','Y','Y','X']})
giving:
  val cat  num cat2
0   a   C    1    X
1   b   D    2    Y
2   c   D    2    Y
3   d   C    1    X
4   e   D    2    Y
5   f   D    2    Y
6   g   D    2    Y
7   h   C    1    X
You'll notice that we can determine the columns num and cat2 to be redundant because the values in the rows for cat, num and cat2 always match across the columns: C == 1 == X and D == 2 == Y.
I'd like to identify the columns that are redundant to ultimately discard them and have just one representation, like below. num or cat2 instead of cat would be fine there too.
  val cat
0   a   C
1   b   D
2   c   D
3   d   C
4   e   D
5   f   D
6   g   D
7   h   C
I can't think of a solution that doesn't involve nested loops that get exponentially more expensive with more columns, and I suspect there might be a clever way to address it. Other questions I've seen about redundant data are usually dealing with when values are equal.
Thanks!
 
     
     
     
     
     
    