I'm a beginner with R (and coding in general). In January 14 I hopefully can begin and finish a R course, but I would like to learn before. I have understanding of the basics and have used functions like read.table,intersect,cbind,paste,write.table. But I only was able to achieve partially what I want to do with two input files (shortened samples):
REF.CSV
SNP,Pos,Mut,Hg  
M522 L16 S138 PF3493 rs9786714,7173143,G->A,IJKLT-M522  
P128 PF5504 rs17250121,20837553,C->T,KLT-M9  
M429 P125 rs17306671,14031334,T->A,IJ-M429  
M170 PF3715 rs2032597,14847792,A->C,I-M170  
M304 Page16 PF4609 rs13447352,22749853,A->C,J-M304  
M172 Page28 PF4908 rs2032604,14969634,T->G,J2-M172  
L228,7771358,C->T,J2-M172  
L212,22711465,T->C,J2a-M410  
SAMPLE.CSV
SNP,Chr,Allele1,Allele2  
L16,Y,A,A  
P128,Y,C,C  
M170,Y,A,A  
P123,Y,C,C  
M304,Y,C,C  
M172,Y,T,G  
L212,Y,-0,-0 
Description what I like to do:
A) Check if SAMPLE.SNP is in REF.SNP  
B) if YES check SAMPLE.Allele status (first read, second read) vs REF.Mut (Ancestral->Derived)  
  B1) if both Alleles are the same and match Derived create output "+ Allele1-Allele2"  
  B2) if both Alleles are the same and match Ancestral create output "- Allele1-Allele2"  
  B3) if Alleles are not the same check if Allele2 is Derived and create output "+ Allele1-Allele2"  
  B4) if both Alleles are "-0" create output "? NC"  
 B5) else create output "? Allele1-Allele2"  
B6) if NO create output "? NA"  
C) Write REF.CSV + output in new row (Sample) and create OUTPUT file  
OUTPUT.CSV (like wanted)
SNP,Pos,Mut,Hg,Sample  
M522 L16 S138 PF3493 rs9786714,7173143,G->A,IJKLT-M522,+ A-A  
P128 PF5504 rs17250121,20837553,C->T,KLT-M9,- C-C  
M429 P125 rs17306671,14031334,T->A,IJ-M429,? NA  
M170 PF3715 rs2032597,14847792,A->C,I-M170,- A-A  
M304 Page16 PF4609 rs13447352,22749853,A->C,J-M304,+ C-C  
M172 Page28 PF4908 rs2032604,14969634,T->G,J2-M172,+ T-G  
L228,7771358,C->T,J2-M172,? NA  
L212,22711465,T->C,J2a-M410,? NC  
What functions I have found interesting and tried so far.
Variant1: A) is done, but I guess it is not possible to write C) with this?
Have not tried to code down B) here 
GT <- read.table("SAMPLE.CSV",sep=',',skip=1)[,c(1,3,4)]  
REF <- read.table("REF.CSV",sep=',')  
rownames(REF) <- REF[,1]  
COMMON <- intersect(rownames(GT),rownames(REF))  
REF <- REF[COMMON,]  
GT <- GT[COMMON,]  
GT<-cbind(REF,paste(GT[,2],'-',X[,3],sep=','))  
write.table(GT,file='OUTPUT.CSV',quote=F,row.names=F,col.names=F)  
Variant2: This is probably a complete mess, forgive me. I was just rying to build a solution on for if looping functions, but I haven't understood R's syntax and logic in this probably. I was not able to get this to run - A) and C) Have not tried to code down B) here
GT<-read.table("SAMPLE.CSV",sep=',',skip=1)[,c(1,3,4)]
rownames(GT)<-GT[,1]
REF <- read.table("REF.CSV",sep=',')
rownames(REF)<-REF[,1]
for (i in (nrow(REF))) {
   for (j in (nrow(GT))) {
       if (GT[j,] %in% REF[i,]) {
       ROWC[i,]<-cbind(REF[i,],paste(GT[j,2],"-",GT[j,3],sep=',')) 
       } else {
       ROWC[i,]<-cbind(REF[i,],"NA",sep=',') 
       }
   }   
}
write.table(ROWC,file='OUTPUT.CSV',quote=F,row.names=F,col.names=F) 
I would be just happy if you can indicate what logic/functions would lead to reach the task I have described. I will then try to figure it out. Thx.
