I have a pandas dataframe, df, with 4,000,000 timesteps for a single stock.
The task is, for each timestep, I want to determine if it rises .1% or falls .1% first. So right now I am converting the dataframe to numpy arrays and looping through each timestep, starting at 0 to 4,000,000.
For each timestep, I iterate through the following time steps until I find one where there is a .1% difference in price. If the price rose .1% the label is 1, if it fell .1% the label is 0. This is taking a very long time.
Is it even possible to vectorize this? I tried thinking of a dynamic programming solution to reduce time complexity but I'm not sure if there is one.
high_bid = df['high_bid'].values
high_ask = df['high_ask'].values
low_bid = df['low_bid'].values
low_ask = df['low_ask'].values
open_bid = df['open_bid'].values
open_ask = df['open_ask'].values
labels = np.empty(len(data))
labels[:] = np.nan
for i in range(len(labels)-1):
    for j in range(i+1,len(labels)-1):
        if (open_ask[i] + (open_ask[i]*target) <= high_bid[j]):
            labels[i] = 1
            break
        elif (open_bid[i] - (open_bid[i]*target) >= low_ask[j]):
            labels[i] = 0
            break
df['direction'] = labels
Example
                 time  open_bid  open_ask  high_bid  high_ask  low_bid  \
0 2006-09-19 12:00:00   1.26606   1.26621   1.27063   1.27078  1.26504   
1 2006-09-19 13:00:00   1.27010   1.27025   1.27137   1.27152  1.26960   
2 2006-09-19 14:00:00   1.27076   1.27091   1.27158   1.27173  1.26979   
3 2006-09-19 15:00:00   1.27008   1.27023   1.27038   1.27053  1.26708   
4 2006-09-19 16:00:00   1.26816   1.26831   1.26821   1.26836  1.26638   
5 2006-09-19 17:00:00   1.26648   1.26663   1.26762   1.26777  1.26606   
6 2006-09-19 18:00:00   1.26756   1.26771   1.26781   1.26796  1.26733   
7 2006-09-19 19:00:00   1.26763   1.26778   1.26785   1.26800  1.26754   
8 2006-09-19 20:00:00   1.26770   1.26785   1.26825   1.26840  1.26765   
9 2006-09-19 21:00:00   1.26781   1.26796   1.26791   1.26806  1.26703   
   low_ask  direction  
0  1.26519          1  
1  1.26975          1  
2  1.26994          0  
3  1.26723          0  
4  1.26653          0  
5  1.26621          1  
6  1.26748        NaN  
7  1.26769        NaN  
8  1.26780        NaN  
9  1.26718        NaN  
I want to add that direction column for all 4 million rows.
 
     
    