I would like to efficiently merge two data frames into one, but one data frame has "more data" than the other. Example:
df_A = pd.DataFrame({"Time": [pd.to_datetime("09:11:37.600"),
                              pd.to_datetime("09:11:37.700"),
                              pd.to_datetime("09:11:37.800")],
                    "A": [0.1, 0.7, -1.1]})
df_B = pd.DataFrame({"Time": [pd.to_datetime("09:11:37.610"),
                              pd.to_datetime("09:11:37.640"),
                              pd.to_datetime("09:11:37.670"),
                              pd.to_datetime("09:11:37.700"),
                              pd.to_datetime("09:11:37.730"),
                              pd.to_datetime("09:11:37.760"),
                              pd.to_datetime("09:11:37.790"),
                              pd.to_datetime("09:11:37.820")],
                    "B": [0.3, -1.5, -0.5, 0.2, 1.2, -0.9, 0.1, -0.2]})
I would like to create a third data frame, df_C, that is an augmented copy of the shortest previous data frame. The augmentation is given by the "smoothed" columns from the longest data frame. The smoothing can be performed by an averaging operation, or another one. In this example, I would like to do the following:
- For the row corresponding to pd.to_datetime("09:11:37.600"), average out the values0.3, -1.5, -0.5because they correspond to the times betweenpd.to_datetime("09:11:37.600")and the time in the next row,pd.to_datetime("09:11:37.700");
- For the corresponding to pd.to_datetime("09:11:37.700"), average out the values0.2, 1.2, -0.9, 0.1because they correspond to the times betweenpd.to_datetime("09:11:37.700")and the time in the next row,pd.to_datetime("09:11:37.800");
- And so on.
The data frame df_C would then have three columns: Time, A, and B, where Time and A come from df_A, and B is the "smoothed" column from df_B following the procedure above.
Is there a way to do this without explicitly writing a for loop, which can be expensive for very long data frames?
I tried the following, but it's copying the same value to all dates (i.e., it's not correct).
df_C = df_A.copy()
df_C.loc[:, "B"] = df_B.loc[(df_B["Time"] >= df_A.shift(1)["Time"].values[1]) & (df_B["Time"] < df_A.shift(-1)["Time"].values[0]), "B"].mean()
 
    