I realize my title is a bit confusing, but I think I can make it clearer if we proceed by example. What I want to do is a vectorized test to check if any of the values in a given series is contained in any of the intervals defined by a DataFrame object with a start and stop column.
Consider the series, valid, which is the column of a DataFrame called trials. Here is what trials Looks like:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 156 entries, 0 to 155
Data columns (total 3 columns):
start 156 non-null values
stop 156 non-null values
valid 156 non-null values
dtypes: bool(1), float64(2)
I have a separate DataFrame called 'blink`. It has three columns:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 41 entries, 0 to 40
Data columns (total 3 columns):
tstart 41 non-null values
tstop 41 non-null values
dur 41 non-null values
dtypes: bool(1), float64(2)
The last column is not directly relevant: it's the duration of the eyeblik, i.e. the difference betwee tstop and tstart.
I would like to set each row of trials['valid'] to False if the interval between it's corresponding trials['start'] to trials['stop'] overlaps with any of the blink['tstart'] to blink['tstop'] intervals.
I could iterate through the rows and use np.arange along with the in operator to do this in a nested loop, but it literally takes hours (my actual data set is much larger than this dummy example). Is there a vectorized approach I could use? If not, is there a faster iteration-based approach?
If anything is unclear, I'll of course be happy to provide additional details.