Here is a possible solution:
[r ,c]=size(A);
idx=bsxfun(@plus,(r:-1:1)',0:c-1);
s=flipud(accumarray(idx(:),A(:)));
Comparing this with spdiags that proposed in other answer this method performs much much better in Octave. Benchmark:
A = rand(1000);
disp('---------bsxfun+accumarray----------')
tic
[r ,c]=size(A);
idx=bsxfun(@plus,(r:-1:1)',0:c-1);
s=flipud(accumarray(idx(:),A(:)));
toc
disp('---------spdiags----------')
tic
dsum = fliplr(sum(spdiags(A)));
toc
Result:
---------bsxfun+accumarray----------
Elapsed time is 0.0114651 seconds.
---------spdiags----------
Elapsed time is 8.62041 seconds.