I'm trying to learn IoT data using time series. The data comes from two different sources. In some measurements, the difference between the sources is very small: one source has 11 rows and the second source has 15 rows. In other measurements, one source has 30 rows and the second source has 240 rows.
Thought to interpolate using:
 df.resample('20ms').interpolate()
but sow that it delete some rows. Is there any method to interpolate without deleting or should I delete rows?
EDIT - data and code:
#!/usr/bin/env python3.6
import pandas as pd
import sklearn.preprocessing
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
first_df_file_name='interpolate_test.in'
df = read_csv(first_df_file_name, header=0, squeeze=True, delimiter=' ')
print(df.head(5))
idx=0
new_col = pd.date_range('1/1/2011 00:00:00.000000', periods=len(df.index), freq='100ms')
df.insert(loc=idx, column='date', value=new_col)
df.set_index('date', inplace=True)
upsampled = df.resample('20ms').interpolate()
print('20 ms, num rows', len(upsampled.index))
print(upsampled.head(5))
upsampled.to_csv('test_20ms.out')
upsampled = df.resample('60ms').interpolate()
print('60 ms, num rows', len(upsampled.index))
print(upsampled.head(5))
upsampled.to_csv('test_60ms.out')
This is the test input file name:
a b
100 200
200 400
300 600
400 800
500 1000
600 1100
700 1200
800 1300
900 1400
1000 2000
Here is the output (parts of it)
 //output of interpolating by 20 milis - this is fine
                         a      b
 date                                 
 2011-01-01 00:00:00.000  100.0  200.0
 2011-01-01 00:00:00.020  120.0  240.0
 2011-01-01 00:00:00.040  140.0  280.0
 2011-01-01 00:00:00.060  160.0  320.0
 2011-01-01 00:00:00.080  180.0  360.0
 60 ms, num rows 16
 //output when interpolating by 60 milis - data is lost
                         a      b
 date                                 
 2011-01-01 00:00:00.000  100.0  200.0
 2011-01-01 00:00:00.060  160.0  320.0
 2011-01-01 00:00:00.120  220.0  440.0
 2011-01-01 00:00:00.180  280.0  560.0
 2011-01-01 00:00:00.240  340.0  680.0
So, should I delete rows from the largest source instead of interpolating? If I'm interpolating, how can I avoid loosing data?
