this dataset print correlation of two columns at left if you look at the row number 3 and 42, you will find they are same. only column position is different. that does not affect correlation. I want to remove column 42. But this dataset has many these row of similar values. I need a general algorithm to remove these similar value and have only unique.
            Asked
            
        
        
            Active
            
        
            Viewed 96 times
        
    2 Answers
0
            
            
        You could try a self join. Without a code example, it's hard to answer, but something like this maybe:
df.merge(df, left_on="source_column", right_on="destination_column")
You can follow that up with a call to drop_duplicates.
        suvayu
        
- 4,271
 - 2
 - 29
 - 35
 
- 
                    this question might make it easy https://stackoverflow.com/questions/32093829/remove-duplicates-from-dataframe-based-on-two-columns-a-b-keeping-row-with-max. But the difference in my question and this is that in my dataframe rows are different but similar – Dijkstra Algorithm Jul 14 '21 at 12:37
 - 
                    1@DijkstraAlgorithm instead of posting a screenshot, please post your data as text, and code for your approach. You cannot expect others to do the work for you. See the [guideline](https://stackoverflow.com/help/how-to-ask), particularly the section "Help others reproduce the problem". Also, did you look at the documentation for `drop_duplicates` I pointed to? It allows you to "ignore" certain columns. – suvayu Jul 14 '21 at 13:00
 
0
            As the correlation_value seems to be the same, the operation should be commutative, so whatever the value, you just have to focus on two first columns. Sort the tuple and remove duplicates
# You can probably replace 'sorted' by 'set'
key = df[['source_column', 'destination_column']] \
          .apply(lambda x: tuple(sorted(x)), axis='columns')
out = df.loc[~key.duplicated()]
>>> out
  source_column destination_column  correlation_Value
0             A                  B                  1
2             C                  E                  2
3             D                  F                  4
        Corralien
        
- 109,409
 - 8
 - 28
 - 52
 
