I have written the code given below. There are two Pandas dataframes: df contains columns timestamp_milli and pressure and df2 contains columns timestamp_milli and acceleration_z. Both dataframes have around 100'000 rows. In the code shown below I'm searching for each timestamp of each row of df the rows of df2 where the time difference lies within a range and is minimal.
Unfortunately the code is extremly slow. Moreover, I'm getting the following message originating from the line df_temp["timestamp_milli"] = df_temp["timestamp_milli"] - row["timestamp_milli"]:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
How can I speedup the code and solve the warning?
acceleration = []
pressure = []
for index, row in df.iterrows():
mask = (df2["timestamp_milli"] >= (row["timestamp_milli"] - 5)) & (df2["timestamp_milli"] <= (row["timestamp_milli"] + 5))
df_temp = df2[mask]
# Select closest point
if len(df_temp) > 0:
df_temp["timestamp_milli"] = df_temp["timestamp_milli"] - row["timestamp_milli"]
df_temp["timestamp_milli"] = df_temp["timestamp_milli"].abs()
df_temp = df_temp.loc[df_temp["timestamp_milli"] == df_temp["timestamp_milli"].min()]
for index2, row2 in df_temp.iterrows():
pressure.append(row["pressure"])
acc = row2["acceleration_z"]
acceleration.append(acc)