I have panda dataframe indexed by ID and sorted by value. I want to create a sample size of n=20000 where there are 40000 rows in total and 2 rows are consecutive/paired. I want to perform additional calculations on these 2 consecutive / paired rows
e.g. If I say sample size n=2 I want to randomly pick and find the difference in distance of each of the following picks.
Additional condition: value difference can't exceed 4000.
index       value   distance
cg13869341  15865   1.635450
cg14008030  18827   4.161332
Then distance of the following etc
cg20826792  29425   0.657369
cg33045430  29407   1.708055
Sample original dataframe
index       value   distance
cg13869341  15865   1.635450
cg14008030  18827   4.161332
cg12045430  29407   0.708055
cg20826792  29425   0.657369
cg33045430  69407   1.708055
cg40826792  59425   0.857369
cg47454306  88407   0.708055
cg60826792  96425   2.857369
I tried using df_sample = df.sample(n=20000) Then i got bit lost trying to figure out how to get the next row for each value in df_sample
original shape is (480136, 14)
 
     
    