I'm looking for suggestions on how to solve the following problem elegantly. Although performance isn't an issue in my specific case, I'd appreciate comments regarding good practices.
Thanks in advance!
The short version:
I'm trying to average matrix rows according to some logic, while ignoring NaN values. The code I currently have does not handle NaN values the way I want.
The long version:
My data is built in the following manner:
- A single (first) column of "bins". The amount of rows for every bin is not constant. The bins don't have to be integers. Rows are pre-sorted.
- A variable number of data columns, possibly including NaNs.
Here's an example:
DATA = [...
180     NaN     NaN     1.733
180     NaN     NaN     1.703
200     0.720   2.117   1.738
200     0.706   2.073   1.722
200     0.693   2.025   1.723
200     NaN     NaN     1.729
210     NaN     NaN     1.820
210     NaN     NaN     1.813
210     NaN     NaN     1.805
240     NaN     NaN     1.951
240     NaN     NaN     1.946
240     NaN     NaN     1.946
270     NaN     NaN     2.061
270     NaN     NaN     2.052
300     0.754   2.356   2.103
300     0.758   2.342   2.057
300     NaN     NaN     2.066
300     NaN     NaN     2.066 ];
The desired result is a matrix that contains the unique "bins" in the first column, and means "unspoiled by NaNs" in the rest, e.g.:
- If for a specific column+bin, there are only NaNs (in the above example: 1st data column+bin 210) - the result would be NaN.
- If for a specific column+bin there is a mix of NaNs and numbers, the result would be the mean of the valid numbers. In the above example: 1st data column+bin 200 should give (0.720+0.706+0.693)/3=0.7063-- note the division by 3 (and not 4) for this column+bin.
Here's the desired result for the above example:
RES = [...
180     NaN     NaN     1.718
200     0.7063  2.072   1.728
210     NaN     NaN     1.812
240     NaN     NaN     1.948
270     NaN     NaN     2.056
300     0.756   2.349   2.074 ];
What I tried so far:
This is some code I managed to compile from several sources. It is working well for column+bin that contain NaNs or numbers only.
nDataCols=size(DATA,2)-1;
[u,m,n] = unique(DATA(:,1));
sz = size(m);
N=accumarray(n,1,sz);
RES(length(u),nDataCols) = 0; %Preallocation
for ind1 = 1:nDataCols
    RES(:,ind1)=accumarray(n,DATA(:,ind1+1),sz)./N;
end
RES= [u,RES];
Here's what I'm currently getting:
RES = [...
180     NaN     NaN     1.718
200     NaN     NaN     1.728
210     NaN     NaN     1.812
240     NaN     NaN     1.948
270     NaN     NaN     2.056
300     NaN     NaN     2.074 ];
p.s.
- If by any chance this is easier to do using a spreadsheet software (such as MS Excel) - I'd love to hear ideas.
- Doing the computation on a per-column basis is my current idea on how to handle this. I was just wondering if there's a way to generalize it to take the complete matrix right away.
 
     
     
    