I have two dataframes. The first dataframe is a list of genetic variants, their identifiers, and their position on a chromosome. The second is a list of genes where columns in each row specify the start and stop position of a gene on a chromosome.
I want to see which genetic variants fall within a gene's 'range' denoted by the start_20 and stop_20 cols. A genetic variant may fall into the range of more than 1 gene. For example, here snp "rs1" will map to gene A and gene B.
This is what I have tried so far:
df of genes ranges
chromosome<-c("1", "1", "2")
start_20<-c("1", "1", "5")  
stop_20<-c("4", "4", "6")  
gene<-c("A", "B", "C") 
genelist=data.frame(chromosome, start_20,  stop_20, gene,stringsAsFactors=F  )
df of snps and their positions
chromosome<-c("1", "2")
snp<-c("rs1", "rs2")
position<-c("3", "5") 
snplist=data.frame(chromosome,snp,position,stringsAsFactors=F)
Aim is to match snps to genes via base pair positions (i.e snp 1 has a position of '3' meaning it maps to gene A and gene B).
genelist.bychrome <- vector("list", 2)
List of genes by chromosome.
for(i in 1:2) genelist.bychrome[[i]] <- genelist[genelist[,"chromosome"]==i,]  
Empty container of length nrow(snplist) Put matched genes in here if you find one
gene.matched <- rep("",nrow(snplist))
gene.matched<-as.list(gene.matched)
#looping across each observation in snplist
    for(i in 1:nrow(snplist)){
# snplist[i,"chromosome"] is the chromosome of interest
# Because of consecutive ordering genelist.bychrome[[3]] gives the genelist       for chromosome 3
 Therefore, genelist.bychrome[[ snplist[i,"chromosome"] ]] gives the genelist for the chromosome of interest
 VERY IMPORTANT: get.gene gives the index in genelist.bychrome[[     snplist[i,"chromosome"] ]], NOT genelist
        if(snplist[i,"chromosome"] <= 1){ 
                get.gene<- which((genelist.bychrome[[ snplist[i,"chromosome"] ]][,"stop_20"] >= snplist[i,"position"])  &    
            # get matching list element of genelist.bychrome 
            # in this element collect indices for rows where stop position is greater than the postion of the snp and
            # start position is less than the position of the snp
            # this should collect multiple rows for some snps   
            # dump the gene for this index in the matching element of gene.matched 
            # i.e get.gene<- which(genelist.bychrome[[1]]  [,"stop_20"] >= snplist[1,3])  & (genelist.bychrome[[1]]  [,"start_20"] <= snplist[1,3])
            # gene.matched <- genelist.bychrome[[1]][get.gene,"gene"]
                    ( genelist.bychrome[[ snplist[i,"chromosome"] ]][,"start_20"] <= snplist[i,"position"])) # correct                  
                        if(length(get.gene)!=0) gene.matched[i]<- genelist.bychrome[[ snplist[i,"chromosome"] ]][get.gene,"gene"]
                             } 
                                    } # end for()
#bind the matched genes to the snplist
    snplist.new <- cbind(snplist,gene.matched)
Any tips would be much appreciated! Thank you.
 
     
    