I have two dataframe , showing below :
A Dataframe(about 50k row):
| ID | Timestamp | 
|---|---|
| 1 | 2021/4/28 01:00:00 | 
| 2 | 2021/4/27 01:03:02 | 
| 1 | 2021/4/28 02:05:01 | 
| ... | ... | 
And B Dataframe(about 16M row):
| ID | Timestamp | Activities | 
|---|---|---|
| 1 | 2021/4/28 00:59:58 | 30 | 
| 2 | 2021/4/27 01:02:58 | 27 | 
| 1 | 2021/4/28 02:04:07 | 44 | 
| 1 | 2021/4/28 02:04:08 | 45 | 
| ... | ... | ... | 
I need to find out the Activities value in B dataframe which has the same ID and has the closest Timestamp in A dataframe , and incert to A Dataframe's new column.
Now I'm trying solving this question by simple loop , it works , but it's too slow. Here is the pseudo code :
- Iterate each row from A dataframe and get ID and Timestamp
- filter B by ID and make sure B Timestamp in B is less than or equal to A timestamp, sort decendingly by B's timestamp and get the 1st Activities value
- insert to A's new column in current row
- drop B's row which I have just used
- go forward to next row
But as I mentioned , this is too slow, how can I speed up this procedure ?
