I have one data frame with 332 names and another with 56000. All of the 332 names are included in the larger data frame. How do I remove rows of data from the large data frame if the names are included in the smaller data frame?
            Asked
            
        
        
            Active
            
        
            Viewed 44 times
        
    0
            
            
        - 
                    Welcome to SO! Can your post a minimal reproducible example? See: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – markus Apr 16 '20 at 22:03
- 
                    We're going to need what data structure they're stored in (vector, data frame, data table, tibble etc.). You can find this out with the `class()` function. – Daniel V Apr 16 '20 at 22:07
2 Answers
0
            
            
        Using the built in mtcars dataset in place of your large dataset, use the %in% operator to subset to those in a reference data frame (your smaller one) and ! to make it "not in". Change dataframe names and variables to suit your need.
# SETUP
refDF <- data.frame("ID" = c(4,6))
# SOLUTION 
mtcars[!mtcars$cyl %in% refDF$ID,]
 
    
    
        rg255
        
- 4,119
- 3
- 22
- 40
0
            
            
        We can also do
library(dplyr)
mtcars %>%
   filter(!cyl %in% refDF$ID)
data
refDF <- data.frame("ID" = c(4,6))
 
    
    
        akrun
        
- 874,273
- 37
- 540
- 662
