I have a phylip formatted text file of 300+ aligned COI sequences. I am trying to condense sequences into haplotypes for analysis using an R script written by a friend. The part I am having trouble with is where the program compares each sequence to the following sequence and determines if they differ by more than just N characters. It will run through the first few sequences before throwing the following error:
`Error in if (dif.nuc1[p] == "N" | dif.nuc2[p] == "N") { : missing value where TRUE/FALSE needed`
There are no gaps in the alignment so there shouldn't be any need to manage N/A data.
Any idea what the issue is and/or how to fix it? Alternatively, any recommendations for programs to consolidate haplotypes would also be appreciated.
Thank you in advance.
Below is the console output:
>       if (sum(dif.nuc1=='N')==0 & sum(dif.nuc2=='N')==0){       
+       } else if (length(dif.nuc1)!=0){
+         counter<- 0
+         for (p in 1:length(dif.nuc1)){       
+           cat('p is', p, '\n')
+           if (dif.nuc1[p]== 'N'| dif.nuc2[p]== 'N'){
+             counter<- (counter + 1)
+           }
+         }  
+         if (counter == length(dif.nuc1)){        
+           hap.equiv<- c(hap.equiv, paste('Hap_', m, ' == Hap_', n, '  ', sep=''))
+         }
+       }
Error in if (dif.nuc1[p] == "N" | dif.nuc2[p] == "N") { : 
missing value where TRUE/FALSE needed
I have tried including the modifying the code in the following ways to manage N/A data, but did not solve the issue.
if (dif.nuc1[p] == 'N' | dif.nuc2[p] == 'N' | is.na(dif.nuc1[p]) | is.na(dif.nuc2[p])) {
if (sum(dif.nuc1 %in% c('N', 'NA')) == 0 & sum(dif.nuc2 %in% c('N', 'NA')) == 0) {
          } else if (length(dif.nuc1)!=0){
            counter<- 0
            for (p in 1:length(dif.nuc1)){
              cat('p is', p, '\n')
              if (dif.nuc1[p]== 'N'| dif.nuc2[p]== 'N'){
                counter<- (counter + 1)
I have also double checked my data to ensure no ambiguity codes and no gaps that woudld cause NA data
