I have many Excel files in a directory with the same structure for each file -- for example the data below could be test1.xlsx:
Date      Type     Name      Task       Subtask       Hours
3/20/16   Type1    Name1     TaskXyz    SubtaskXYZ    1.00  
3/20/16   Type1    Name2     TaskXyz    SubtaskXYZ    2.00  
3/20/16   Type1    Name3     TaskXyz    SubtaskXYZ    1.00  
What I would like to do is create a new Excel file with the file name and sum of each file in the directory that would look like this:
File Name     Sum of hours
Test1.xlsx    4
test2.xlsx    10
...           ...
I just started playing around with glob, and that has been helpful for creating one large dataframe like this:
all_data = pd.DataFrame()
for f in glob.glob("path/*.xlsx"):
    df = pd.read_excel(f, skiprows=4,index_col=None, na_values=['NA'])
    all_data = all_data.append(df,ignore_index=True)
This has been helpful for creating a dataframe of all the data agnostic of the sheet it came from and I have been able to use groupbys to analyze the data on a macro level but, for all that i know, i cannot sum by sheet put into the data frame only things like:
task_output = all_data.groupby(["Task","Subtask"])["Hours"].agg([np.sum,np.mean])
Where on the whole dataframe i am able to sum and get a mean vs each individual sheet.
Any ideas on where to start with this?
 
     
    