Hi everyone I have the datasets below and i would like to identify each row from each of the two sets in R.
I have tried sorting each in ascending order then appending row indices but still in vain.
As you can see i am trying to detect records marked red on the RHS as missing in the LHS.
Anyone with clue or link to a tutorial I will really appreciate.
The problem looks outrageous but that's what am dealing with currently
This is a simple reprex output.
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df1 = data.frame(x = c(2000,3000,3000,48400,2000))
df2 = data.frame(y = c(2000,2000,2000,3000,3000,3000,41200,48400))
df1 = df1 %>% arrange(x) %>% group_by(x) %>% mutate(rownum = row_number()) %>% 
  mutate(uniqueid = paste0(x,"_",rownum))
df2 = df2 %>% arrange(y) %>% group_by(y) %>% mutate(rownum = row_number()) %>% 
  mutate(uniqueid = paste0(y,"_",rownum))
left_join(df2,df1)
#> Joining, by = c("rownum", "uniqueid")
#> # A tibble: 8 x 4
#> # Groups:   y [4]
#>       y rownum uniqueid     x
#>   <dbl>  <int> <chr>    <dbl>
#> 1  2000      1 2000_1    2000
#> 2  2000      2 2000_2    2000
#> 3  2000      3 2000_3      NA
#> 4  3000      1 3000_1    3000
#> 5  3000      2 3000_2    3000
#> 6  3000      3 3000_3      NA
#> 7 41200      1 41200_1     NA
#> 8 48400      1 48400_1  48400
df2
#> # A tibble: 8 x 3
#> # Groups:   y [4]
#>       y rownum uniqueid
#>   <dbl>  <int> <chr>   
#> 1  2000      1 2000_1  
#> 2  2000      2 2000_2  
#> 3  2000      3 2000_3  
#> 4  3000      1 3000_1  
#> 5  3000      2 3000_2  
#> 6  3000      3 3000_3  
#> 7 41200      1 41200_1 
#> 8 48400      1 48400_1
The trickiest part here is that the data sets should be arranged as they appear in the picture since each row in each table refer to a specific transaction.
My output above flags the third 2000 value and not the second on as it is supposed to.

 
    