I have very big matrix,  I know that some of the colnames of them are duplicated. so I just want to find those duplicated colnames and remove on of the column from duplicate. 
I tried duplicate(), but it removes the duplicate entries. 
Would someone help me to implment this in R ?
the point is that, duplicate colnames, might not have duplicate entires.
            Asked
            
        
        
            Active
            
        
            Viewed 6.9k times
        
    33
            
            
         
    
    
        user2806363
        
- 2,513
- 8
- 29
- 48
5 Answers
57
            
            
        Let's say temp is your matrix
temp <- matrix(seq_len(15), 5, 3)
colnames(temp) <- c("A", "A", "B")
##      A  A  B
## [1,] 1  6 11
## [2,] 2  7 12
## [3,] 3  8 13
## [4,] 4  9 14
## [5,] 5 10 15
You could do
temp <- temp[, !duplicated(colnames(temp))]
##      A  B
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## [4,] 4 14
## [5,] 5 15
Or, if you want to keep the last duplicated column, you can do
temp <- temp[, !duplicated(colnames(temp), fromLast = TRUE)] 
##       A  B
## [1,]  6 11
## [2,]  7 12
## [3,]  8 13
## [4,]  9 14
## [5,] 10 15
 
    
    
        David Arenburg
        
- 91,361
- 17
- 137
- 196
- 
                    Hi @david-arenburg. Thanks for such a useful solution. What if a dataframe has two columns with different `column names` but same `values`. Duplicate just names are different. How would we approach that? – Roy Dec 20 '22 at 03:40
18
            
            
        Or assuming data.frames you could use subset:
subset(iris, select=which(!duplicated(names(.)))) 
Note that dplyr::select is not applicable here because it requires column-uniqueness in the input data already. 
 
    
    
        Holger Brandl
        
- 10,634
- 3
- 64
- 63
- 
                    3`iris <- iris %>% subset(., select = which(!duplicated(names(.))))` a pipe-friendly version – seapen Apr 23 '20 at 22:20
- 
                    No need for `which` here. Without `dplyr`, a correct version is `subset(iris, select = !duplicated(names(iris)))` – Maël Jul 28 '23 at 12:55
3
            
            
        temp = matrix(seq_len(15), 5, 3)
colnames(temp) = c("A", "A", "B")
temp = as.data.frame.matrix(temp)
temp = temp[!duplicated(colnames(temp))]
temp = as.matrix(temp)
 
    
    
        David Buck
        
- 3,752
- 35
- 31
- 35
 
    
    
        sneha patil
        
- 41
- 3
- 
                    Why convert it to a dataframe and then back to matrix? How is it different from my answer? That you don't need to write an extra comma? – David Arenburg Sep 23 '20 at 06:26
- 
                    That is important because I couldn't get your solution to work because mine was a data.table data.frame. Once I converted it to a matrix, worked like a charm. The comma omission is incidental and does not affect anything. – Arani Jan 05 '21 at 12:02
1
            
            
        To remove a specific duplicate column by name, you can do the following:
test = cbind(iris, iris) # example with multiple duplicate columns
idx = which(duplicated(names(test)) & names(test) == "Species")
test = test[,-idx]
To remove all duplicated columns, it is a bit simpler:
test = cbind(iris, iris) # example with multiple duplicate columns
idx = which(duplicated(names(test)))
test = test[,-idx]
or:
test = cbind(iris, iris) # example with multiple duplicate columns
test = test[,!duplicated(names(test))]
 
    
    
        Adam Erickson
        
- 6,027
- 2
- 46
- 33
0
            
            
        Store all your duplicates into one vector say duplicates, and Use -duplicates with single bracket subsetting to remove duplicate columns.
       # Define vector of duplicate cols (don't change)
       duplicates <- c(4, 6, 11, 13, 15, 17, 18, 20, 22, 
            24, 25, 28, 32, 34, 36, 38, 40, 
            44, 46, 48, 51, 54, 65, 158)
      # Remove duplicates from food and assign it to food2
         food2 <- food[,-duplicates]
 
    
    
        saswat prusty
        
- 19
- 2
- 
                    2Not great to hard-code the duplicated column numbers. It's better and more flexible to do `which(duplicated(colnames(food)))` instead. – user3932000 Jun 20 '19 at 14:21