I am working with different data sets in R. Each data set will have upto 16 columns and 1000s of records. I am trying to find a way to compare two data sets at a time so that I can find the records that were deleted/Updated/Added. I will be using the ID column and color column to identify the variances. Below is a small set example (have not included all the columns):
df1 <- data.frame(ID = letters[1:5], color = c("blue", "white", "red", "green", "blue"))
df2 <- data.frame(ID = c("a","c","d","d"), color = c("blue", "yellow", "green", "blue"))
ID will be the common factor between the datasets.
I need to compare the datasets to get three different set of values:
New Records: records that appear in df1 but not in df2. So I should get:
ID  Color
b   white
c   red
e   blue
Deleted records: records that DO NOT appear in df1 but do appear in df2
   ID    Color
    c     yellow
    d     blue
Updated records This is the most important one I need. Basically anything with the same ID that has a different color:
   ID  df1color  df2color
    c   red       yellow
I have tried using the joins from dplyr package....but was not successful. Is there a way I can do this in R.
 
     
    