I have two datasets:
Dict
| Number_ID | Price | StartDate | EndDate | 
|---|---|---|---|
| 1 | 3.49 | 20200101 | 20991231 | 
| 2 | 0.25 | 20200101 | 20200502 | 
| 2 | 0.49 | 20200503 | 20200509 | 
| 3 | 0.13 | 20200401 | 20200403 | 
| 3 | 0.15 | 20200404 | 20991231 | 
and
Sales_List
| Number_ID | Date of Sale | 
|---|---|
| 1 | 20200103 | 
| 2 | 20200431 | 
| 2 | 20200505 | 
| 3 | 20200402 | 
| 3 | 20200408 | 
My goal is to use data from Dict table and put right value to Sales_List['Price'] from Dict['Price'] on following conditions:
Number_ID is the same
Sales['Date of Sales'] is between Dict['StartDate'] and Dict['EndDate']
I've tried following code, but i have some error:
import pandas as pd
import numpy as pd
Sales_List['Price'] = np.where((Sales_List['Number_ID'] == Dict['Number_ID']) & 
                               (Sales_List['Date of Sale'] >= Dict['StartDate']) & 
                               (Sales_List['Date of Sale'] < Dict['EndDate']),
                               Dict['Price'],'')
also I have tried following code:
Sales_List = Sales.merge(Dict, how='left', on='Number_ID').reset_index()
Sales_List['Value'] = np.where((Sales_List['Date of Sale'] >= Dict['StartDate']) & 
                               (Sales_List['Date of Sale'] < Dict['EndDate']),1,0)
Sales_List = Sales_List[Sales_List['Value'] = 1]
But in the second example this merge allocates too much memory and crashed it. (Sales_List is more than 30M rows, Dict has 12k rows)
MemoryError: Unable to allocate 7.15 GiB for an array with shape (5, 191907116) and data type int64
Any of you have any idea how I can do something like that in python in more efficient way (than second example)?
Output should looks like:
| Number_ID | Date of Sale | Price | 
|---|---|---|
| 1 | 20200103 | 3.49 | 
| 2 | 20200431 | 0.25 | 
| 2 | 20200505 | 0.49 | 
| 3 | 20200402 | 0.13 | 
| 3 | 20200408 | 0.15 | 
| 3 | 20200401 | 0.13 | 
| 3 | 20200409 | 0.15 | 
| 2 | 20200509 | 0.49 | 
| 2 | 20200508 | 0.49 | 
 
    