I have a large dataset (1 million rows plus) and several GB that changes on a regular basis. It models flow characteristics by relating each entry to its upstream neighbor in a flow network. The basic logic I want in the tool is to use a ID field, search for the related upstream device, and write a number stored in a different column (Num2) of the upstream device entry to the original column. This allows me to determine which "level" of the flow network I am at. Here is a sample of the data:
Here is the code I am using:
import pandas as pd
import numpy as np
# Read the CSV file into a Pandas DataFrame
df = pd.read_csv("L:\\Dev_h\\xxxxx.csv")
#put values into the Number field initially:
df['Num2'] = np.where(df['Num1'] > 0, df['Num1'], df['Num2'])
print(df)
null_num2_rows = df[df["Num2"].isnull()]
# For each row with a null Num2 field, find the row with the same ID and a non-null Num2 field
for row in null_num2_rows.iterrows():
    row_index, row_data = row
    # Get the Upstream device field from the current row
    up_field = row_data["Up"]
    # Find the row with the same ID and a non-null Num2 field
    matched_row = df.loc[(df["DevNo"] == up_field) & ~df["Num2"].isnull()]
    # Set the Num2 field of the current row with the Num2 field of the matched row
    df.at[row_index, "Num2"] = matched_row["Num2"].iloc[0]
print(df)
# Save the DataFrame to an excel
df.to_excel("L:\\ROAD ROUTING\\xxxxx.xlsx")
This seems to work fine as an approach for a very small file; here is an example of the output.
DevNo,Up,Down,Num1,Num2
F1      S1    1     1
F2  S1  S2    2     2
F3  S4  S6    3     3
F4  S8        4     4
S1  F1  F2          1
S2  F2  S4          2
S3  F2  S5          2
S4  S2  F3          2
S5  S3  T1          2
S6  F3  S6          3
S7  S6  S8          3
S8  S7  F4          3
However, it scales terribly on a large dataset, maxing out my memory. I am very new to python so I don't really know how to update my logic to accommodate such a large dataset. Loading chunks in pandas doesn't work because of the potential for matching values to not be located in the same chunk as the search row.
How should I update to better handle a large dataset?

 
    