I have a pandas dataframe with gaps in time series.
It looks like the following:
Example Input
--------------------------------------
     Timestamp        Close
 2021-02-07 09:30:00  124.624 
 2021-02-07 09:31:00  124.617
 2021-02-07 10:04:00  123.946
 2021-02-07 16:00:00  123.300
 2021-02-09 09:04:00  125.746
 2021-02-09 09:05:00  125.646
 2021-02-09 15:58:00  125.235
 2021-02-09 15:59:00  126.987
 2021-02-09 16:00:00  127.124 
Desired Output
--------------------------------------------
     Timestamp        Close
 2021-02-07 09:30:00  124.624 
 2021-02-07 09:31:00  124.617
 2021-02-07 09:32:00  124.617
 2021-02-07 09:33:00  124.617
   'Insert a line for each minute up to the next available
   timestamp with the Close value form the last available timestamp'
 2021-02-07 10:03:00  124.617 
 2021-02-07 10:04:00  123.946
 2021-02-07 16:00:00  123.300
   'I dont want lines inserted here. As this date is not
   present in the original dataset (could be a non trading
   day so I dont want to fill this gap)'
 2021-02-09 09:04:00  125.746
 2021-02-09 09:05:00  125.646
 2021-02-09 15:58:00  125.235
   'Fill the gaps here again but only between 09:30 and 16:00 time'
 2021-02-09 15:59:00  126.987
 2021-02-09 16:00:00  127.124 
What I have tried is:
'# set the index column'
df_process.set_index('Exchange DateTime', inplace=True)
'# resample and forward fill the gaps'
df_process_out = df_process.resample(rule='1T').ffill()
'# filter and return only timestamps between 09:30 and 16:00'
df_process_out = df_process_out.between_time(start_time='09:30:00', end_time='16:00:00')
However if I do it like this it also resamples and generates new timestamps on dates that are not existent in the original dataframe. In the example above it would also generate timestamps on a minute basis for 2021-02-08
How can I avoid this?
Furthermore is there a better way to avoid resampling over the whole time.
df_process_out = df_process.resample(rule='1T').ffill()
This generates timestamps from 00:00 to 24:00 and in the next line of code I have to filter most timestamps out again. Doesn't seem efficient.
Any help/guidance would be highly appreciated
Thanks
Edit:
As requested a small sample set
df_in: Input data
df_out_error: Wrong Output Data
df_out_OK: How the output data should look like
In the following ColabNotebook I prepeared a small sample.
https://colab.research.google.com/drive/1Fps2obTv1YPDpTzXTo7ivLI5njoI-y4n?usp=sharing
Notice that this is only a small subset of the data. I'm trying to clean multiple years of data that is structured and shows missing minutes timestamps like this.
 
    

