I have a data frame as shown in Image, what I want to do is to take the mean along the column 'trial'. It for every subject, condition and sample (when all these three columns has value one), take average of data along column trial (100 rows).
what I have done in pandas is as following
sub_erp_pd= pd.DataFrame()
for j in range(1,4):
sub_c=subp[subp['condition']==j]
for i in range(1,3073):
sub_erp_pd=sub_erp_pd.append(sub_c[sub_c['sample']==i].mean(),ignore_index=True)
But this take alot of time.. So i am thinking to use dask instead of Pandas. But in dask i am having issue in creating an empty data frame. Like we create an empty data frame in pandas and append data to it.
as suggested by @edesz I made changes in my approach
EDIT
%%time
sub_erp=pd.DataFrame()
for subno in progressbar.progressbar(range(1,82)):
try:
sub=pd.read_csv('../input/data/{}.csv'.format(subno,subno),header=None)
except:
sub=pd.read_csv('../input/data/{}.csv'.format(subno,subno),header=None)
sub_erp=sub_erp.append(sub.groupby(['condition','sample'], as_index=False).mean())
Reading a file using pandas take 13.6 seconds while reading a file using dask take 61.3 ms. But in dask, I am having trouble in appending.
NOTE - The original question was titled Create an empty dask dataframe and append values to it.