I have a CSV file (Mspec Data) which looks like this:
#Header
#
"Cycle";"Time";"ms";"mass amu";"SEM c/s"
0000000001;00:00:01;0000001452;     1,00;       620
0000000001;00:00:01;0000001452;     1,20;      4730
0000000001;00:00:01;0000001452;     1,40;      4610
...       ;..:..:..;..........;.........;...........
I read it via:
 df = pd.read_csv(Filename, header=30,delimiter=';',decimal= ',' )
the result looks like this:
      Cycle      Time      ms  mass amu  SEM c/s
0         1  00:00:01    1452       1.0      620
1         1  00:00:01    1452       1.2     4730
2         1  00:00:01    1452       1.4     4610
...     ...       ...     ...       ...      ...
3872      4  00:06:30  390971       1.0    32290
3873      4  00:06:30  390971       1.2    31510
This data contains several Mass spec scans with identical parameters. Cycle number 1 means scan 1 and so forth. I would like to calculate the mean in the last column SEM c/s for each corresponding identical mass. in the end i would like to have a new data frame containing only:
ms  "mass amu"  "SEM c/s(mean over all cycles)"
obviously the mean of the mass does not need to be calculated. I would like to avoid to read each cycle into a new dataframe as this would mean I have to look up the length of each Mass spectrum . The "mass range" and " resolution" is obviously different for different measurements (Solution). I guess doing the calculation in numpy directly would be best but I am stuck?
Thank you in advance
 
     
     
    