I was going through the a scipy code for ks test (2 sample) which calculates the maximum distance between CDF's of any two given samples. code for calculating the cumulative Distribution Function(CDF).
I fail to understand the logic in the lines for calculating cdf. First, data1 and data2 is sorted and then using np.searchsorted we are trying to find the position of data_all in both data1 and data2. data_all is nothing but concatenation of sorted data1 and data2.
What if, the min value of data2 is below data1. Doesn't that violate the assumption that cdf shouldn't be decreasing with value
data_all = np.concatenate([data1,data2])
cdf1 = np.searchsorted(data1,data_all,side='right')/(1.0*n1)
cdf2 = (np.searchsorted(data2,data_all,side='right'))/(1.0*n2)