My question may be complicated, please be patient to read it.
I am dealing with the following case,I have two time data set of financial Time series from 2 exchanges (New York and London)
Both data set looks like the following:
London data set:
Date        time.second Price
2015-01-05  32417   238.2
2015-01-05  32418   238.2
2015-01-05  32421   238.2
2015-01-05  32422   238.2
2015-01-05  32423   238.2
2015-01-05  32425   238.2
2015-01-05  32427   238.2
2015-01-05  32431   238.2
2015-01-05  32435   238.47
2015-01-05  32436   238.47
New York data set:
NY.Date     Time    Price
2015-01-05  32416   1189.75
2015-01-05  32417   1189.665
2015-01-05  32418   1189.895
2015-01-05  32419   1190.15
2015-01-05  32420   1190.075
2015-01-05  32421   1190.01
2015-01-05  32422   1190.175
2015-01-05  32423   1190.12
2015-01-05  32424   1190.14
2015-01-05  32425   1190.205
2015-01-05  32426   1190.2
2015-01-05  32427   1190.33
2015-01-05  32428   1190.29
2015-01-05  32429   1190.28
2015-01-05  32430   1190.05
2015-01-05  32432   1190.04
As can be seen, there are 3 columns: Date, time(second), Price
What I am trying to do is that using london data set as a reference, find the data item which is nearest but earlier in New York dataset.
What do I mean by which is nearest but earlier ? I mean, for instance,
"2015-01-01","21610","15.6871" in London data set, I want to find the data in New York data set and which on the same date, and nearest but earlier or equal time, it would be helpful do look at my current program:
# I am trying to avoid using for-loop
for(i in 1:dim(london_data)[1]){ #for each row in london data set
    print(i)
    tempRow<-london_data[i,]
    dateMatch<-(which(NY_data[,1]==tempRow[1])) # select the same date
    dataNeeded<-(london_before[dateMatch,]) # subset the same date data
    # find the nearest but earlier data in NY_data set
    Found<-dataNeeded[which(dataNeeded[,2]<=tempRow[2]),] 
    # Found may be more than one row, each row is of length 3
    if(length(Found)>3)
    {    # Select the data, we only need "time" and "price", 2nd and 3rd  
         # column
         # the data is in the final row of **Found**
         selected<-Found[dim(Found)[1],2:3] 
         if(length(selected)==0) # if nothing selected, just insert 0 and 0
             temp[i,]<-c(0,0)
         else
            temp[i,]<-selected
     }
     else{ # Found may only one row, of length 3
         temp[i,]<-Found[2:3] # just insert what we want
     }
   print(paste("time is", as.numeric(selected[1]))) #Monitor the loop
 }
 res<-cbind(london_data,temp)
 colnames(res)<-c("LondonDate","LondonTime","LondonPrice","NYTime","NYPrice")
The correct output of the above listed data set is**(Only partially)**:
      "LondonDate","LondonTime","LondonPrice","NYTime","NYPrice"
 [1,] "2015-01-05" "32417"      "238.2"       "32417"    "1189.665" 
 [2,] "2015-01-05" "32418"      "238.2"       "32418"    "1189.895" 
 [3,] "2015-01-05" "32421"      "238.2"       "32421"    "1190.01"  
 [4,] "2015-01-05" "32422"      "238.2"       "32422"    "1190.175" 
 [5,] "2015-01-05" "32423"      "238.2"       "32423"    "1190.12"  
 [6,] "2015-01-05" "32425"      "238.2"       "32425"    "1190.205" 
 [7,] "2015-01-05" "32427"      "238.2"       "32427"    "1190.33"  
 [8,] "2015-01-05" "32431"      "238.2"       "32430"    "1190.05"  
 [9,] "2015-01-05" "32435"      "238.47"      "32432"    "1190.04"  
 [10,] "2015-01-05" "32436"      "238.47"      "32432"    "1190.04"
My problem is that, London data set has more than 5,000,000 columns, I tried to avoid for-loop but I still need at least one, above program runs successfully but it took about 24 hours.
How can I avoid using for-loops and accelerate the program ?
Your kind help will be well appreciated.
 
    