Pileup() function is over-counting the number of reads at variant location. (I inherited this code from someone.)
I've looked through the pileup parameters but none seem to cause over-counting. I visualized variants in IGV and saw the number of reads don't match with what pileup is reporting. Pileup is accurate when it reports read counts for variants with low number of reads (ie. 3, or 8). Pileup is inaccurate when it reports read counts for variants with high number of reads (ie. 20, 60).
res <- pileup( bf, snp, 
        scanBamParam=ScanBamParam( flag = scanBamFlag( hasUnmappedMate=F, isProperPair=T, isDuplicate=F ), which = snp ), 
        pileupParam=PileupParam( distinguish_strands=F, min_base_quality=10, max_depth=1e4 ) )
where bf is bam file, snp is grange object listing all snps found.
> bf
class: BamFile 
path: Sample_4.split.bam
index: Sample_4.split.bai
isOpen: FALSE 
yieldSize: NA 
obeyQname: FALSE 
asMates: FALSE 
qnamePrefixEnd: NA 
qnameSuffixStart: NA 
> head(snp)
GRanges object with 6 ranges and 5 metadata columns:
              seqnames    ranges strand | paramRangeID            REF
                 <Rle> <IRanges>  <Rle> |     <factor> <DNAStringSet>
   rs13475703     chr1   5124338      + |         <NA>              A
    rs6408157     chr1   9547068      + |         <NA>              A
                             ALT      QUAL      FILTER
              <DNAStringSetList> <numeric> <character>
   rs13475703                  G    233.77        PASS
    rs6408157                  G     52.77        PASS
  -------
  seqinfo: 66 sequences from mm10 genome
> res[66,]
   seqnames      pos nucleotide count            which_label
66     chr1 87942764          G    61 chr1:87942764-87942764
For the example above, for the G variant at chr1:87942764, pileup reports 61 reads. When I open the bam file in IGV, I only count 41 reads supporting the G variant.
There are no error messages. Just that the counts reported do not match with what I see in IGV.
