I am getting data from an API using Python and then transforming this data into a Pandas Dataframe. This dataframe has a column Date and more 50 columns. I have retrieved the data from the whole past week.
Using SQLAlchemy, I created a table in Azure SQL Server and loaded the dataframe containing the data from the whole past week using df.to_sql().
df.to_sql('table', engine, index=False, dtype=types)
Any past date from this API can be changed.
For example, I stored 60k rows with the column Date as of 05/03/2023, but if I query the API tomorrow and ask for this specific date (05/03/2023), there may be 62k rows (2k new rows) and even rows that changed. So, I want to do an upsert based on a selected date (such as 05/03/2023).
I have tried following this: How to upsert pandas DataFrame to Microsoft SQL Server table?
So, I created a #temp_table to store the new dataframe with the updated data from 05/03/2023 and tried to run the query below that was based in the other question I mentioned.
OBS: the columns provided in the example below are just for giving an example. In real life, I have 51 columns (Date + 50).
OBS 2: I don't have a primary key, so in the ON I am comparing the dates.
# Just to be clear, this is how I am defining my engine variable with SQLAlchemy
engine = sqlalchemy.create_engine('mssql+pyodbc:///?odbc_connect=%s' % url_db,fast_executemany=True,use_setinputsizes=False)
# Creating #temp_table with new df
new_df.to_sql('#temp_table', engine,if_exists='replace', index=False, dtype=types)
# Creating the upsert query
query = (
f"MERGE table WITH (HOLDLOCK) AS main "
f"USING (SELECT date, name, age FROM #temp_table) AS temp "
f"ON ([main].[date] = [temp].[date]) "
f"WHEN MATCHED THEN "
f"UPDATE SET [main].[date] = [temp].[date], [main].[name] = [temp].[name], [main].[age] = [temp].[age]"
f"WHEN NOT MATCHED THEN "
f"INSERT (date, name, age) VALUES ([temp].[date], [temp].[name], [temp].[age]);"
)
# Running the query
with engine.begin() as conn:
try:
conn.execute(text(query)) #GETING STUCK HERE!!!
conn.commit()
except Exception as e:
print(str(e))
conn.rollback()
I have done some print() and found out that everything is running properly, but it gets stuck in the conn.execute(text(query)). The code has been running for 30min and it didn't finish.
Is there anything wrong with my code? Or is it caused by the large amount of data? How can I optimize it?