So this is my DataFrame:
Basically what's in there:
- There is
Query-Datecolumn, and foreachQuery Datedate, there are 30Check-Indays forward, and foreachCheck-Indate there are 5 days forward.
Note: The time formart is Day/Month/Year
Note: The hotel name is the same for every row, only the Price and Nights columns are different ( and the dates )
- There is
Nightscolumn, it's basically subtraction betweenCheck-outandCheck-In
An example for a DataFrame:
+------------+-----------+-----------+------------+-------+--------+
| Query-Date | Check-In | Check-Out | Hotel Name | Price | Nights |
+------------+-----------+-----------+------------+-------+--------+
| 1/1/2000 | 1/1/2000 | 2/1/2000 | HotelName1 | 10 | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 3/1/2000 | HotelName1 | 21 | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 4/1/2000 | ... | .. | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 5/1/2000 | ... | .. | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | ... | .. | 5 |
+------------+-----------+-----------+------------+-------+--------+
| | 2/1/2000 | 3/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 4/1/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 5/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 7/1/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
| | 3/1/2000 | 4/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 5/1/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 7/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 8/1/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
| | ... | | | | |
+------------+-----------+-----------+------------+-------+--------+
| | 30/1/2000 | 31/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 1/2/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 2/2/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 3/2/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 4/2/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
| 2/1/2000 | 2/1/2000 | 2/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 3/1/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 4/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 5/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
| | 3/1/2000 | ... | | | |
+------------+-----------+-----------+------------+-------+--------+
Now, in some rows, there are missing dates, so for example we could find something like this:
+------------+-----------+-----------+------------+-------+--------+
| 3/1/2000 | 3/1/2000 | 4/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 7/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | 4/1/2000 | 5/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 7/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 8/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 9/1/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
We can notice that for the Query-Date which the value "3/1/2000" and for the Check-In "3/1/2000" there are two missings: the Date "5/1/2000" ( 2 Nights ) and "8/1/2000" ( 5 Nights )
What I want is to add these days, with the same Hotel Name and with the price of the mean of the closest previous row with the same Nights value with the closest forward row with the same Nights value
But this thing is more complicated because the missing can be for a whole Query Date or even few.
So basically I found several topics:
- https://stackoverflow.com/a/19324591
Basically they say that the Date should be the index, and then we can use the
pd.date_rangeandreindex, So what I did is to determine these 3 columns:Query-Date,Check-In,Check-Outto be the index:
pd.set_index(['Query Date', 'Check-In', 'Check-Out'], inplace=True)
I also can find the min and max value of Query-Date, but I couldn't find a way to make a range for 3 columns.
- https://stackoverflow.com/a/44102947/11356272 In this topic there's a code to fill these null gaps the previous and forward one, but it's not really my case, because I don't need the LAST previous and forward rows for the mean, but the the LAST previous and forward rows with the same nights.
I need something like that in pseudo code:
query_date = min(pd['Query Date'])
while(query_date != max(pd['Query Date'])):
for(i in range(0, 30 + 1)):
for(j in range(0,5 + 1)):
if(check if row with Query-Date: query_date
and Check-In: query_date + (i days)
and Check-Out : query_date + (i + j days) is not exists):
add to pd new row with these Query-Date, Check-In, Check-out with price: (The last price value with the same + the forward price value with the same amount) / 2
Hopefully you guys could help me do that.