I have this data set:
ItemNumber  Successes   Trials    Prob
15          14           95       0.047
9625        20           135      0.047
19          14           147      0.047
24          12           120      0.047
20          15           133      0.047
22          8            91       0.047
9619        16           131      0.047
10006       8            132      0.047
25          15           127      0.047
I want to identify a culmulative binomial distribution p value for each item, to understand the probability of observing an equal or higher number of item occurrences.
I used this code:
import sys
import scipy
from scipy.stats.distributions import binom
import sys
for line in open(sys.argv[1], 'r').readlines():
    line = line.strip().split()
    Item,num_succ,num_trials,prob = line[0],int(line[1]),int(line[2]),float(line[3])
    print Item + "\t" + str(num_succ) + "\t" + str(num_trials) + "\t" + str(prob) + "\t" + str(1 - (binom.cdf(num_succ, num_trials, prob)))
The output looks like this:
Item    NumSucc NumTrials   Prob    Binomial
15      14      95         0.047    3.73e-05
9625    20      135        0.047    1.48e-06
19      14      147        0.047    0.004
24      12      120        0.047    0.0043
20      15      133        0.047    0.00054
22      8       91         0.047    0.027
9619    16      131        0.047    0.0001
10006   8       132        0.047    0.169
25      15      127        0.047    0.0003
The problem: When I pick a line and check the obtained cumulative binomial p value against an online tool like this: http://stattrek.com/online-calculator/binomial.aspx , the results are not the same.
For example,
For item 20 (# successes = 15, # trials = 133, Prob = 0.047):
My Binomial P Val = 0.00054
StatTrek P Val = 0.0015
However, I can see from StatTrek, that what I have looked up is the Culmulative Probability: P(X>15), but since I want "equal to or greater than", what I actually want to calculate is the P(X>=15) (which is 0.0015).
I'm struggling to correctly edit the above code, to change the P value returned from "find number of incidences greater than" to "find the number of incidences greater than or equal to". If someone could demonstrate I would appreciate it. If you look at this question, I was trying to follow Volodymyr's comment.