I have a large data frame that looks like this. I want to find which genes match the others based on an overlap between the start and end positions.
library(tidyverse)
data <- data.frame(group=c(1,1,1,2,2,2),
                     genes=c("A","B","C","D","E","F"), 
                     start=c(1000,2000,3000,800,400,2000),
                     end=c(1500,2500,3500,1200,500,10000))
data
#>   group genes start   end
#> 1     1     A  1000  1500
#> 2     1     B  2000  2500
#> 3     1     C  3000  3500
#> 4     2     D   800  1200
#> 5     2     E   400   500
#> 6     2     F  2000 10000
Created on 2022-12-05 with reprex v2.0.2
I want something like this.
data
#>   group genes start   end   match
#> 1     1     A  1000  1500    A-D
#> 2     1     B  2000  2500    B-F
#> 3     1     C  3000  3500    C-F
#> 4     2     D   800  1200    A-D
#> 5     2     E   400   500    NA
#> 6     2     F  2000 10000    F-C-B
I am a bit lost on how to start. Any help is appreciated
 
     
    