I have a large dataset that I pulled from Data.Medicare.gov (https://data.medicare.gov/Physician-Compare/Physician-Compare-National-Downloadable-File/mj5m-pzi6)
It's a cvs of all physicians (2.4 million rows by 41 columns, 750MB), lets call this physician_df, however, I cannot load into memory on my computer (memory error). 
I have another df loaded in memory (summary_df) and I want to join columns (NPI, Last Name, First Name) from physician_df.
Is there any way to do this without having to load the data to memory? I first attempted by using their API but I get capped out (I have about 500k rows in my final df and this will always be changing). Would storing the physician_df into a SQL database make this easier?
Here are snippets of each df (fyi, the summary_df is all fake information).
summary_df
DOS        Readmit    SurgeonNPI   
1-1-2018   1          1184809691   
2-2-2018   0          1184809691   
2-5-2017   1          1093707960   
physician_df
NPI          PAC ID      Professional Enrollment   LastName FirstName
1184809691   2668563156  I20120119000086           GOLDMAN  SALUJA
1184809691   4688750714  I20080416000055           NOLTE    KIMBERLY
1093707960   7618879354  I20040127000771           KHANDUJA KARAMJIT
Final df:
DOS        Readmit    SurgeonNPI  LastName FirstName
1-1-2018   1          1184809691  GOLDMAN  SALUJA
2-2-2018   0          1184809691  GOLDMAN  SALUJA
2-5-2017   1          1093707960  KHANDUJA KARAMJIT
If I could load the physician_df then I would use the below code..
pandas.merge(summary_df, physician_df, how='left', left_on=['SurgeonNPI'], right_on=['NPI'])
 
     
     
    