Let's say I have two device_vector<byte> arrays, d_keys and d_data.
If d_data is, for example, a flattened 2D 3x5 array ( e.g. { 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 7, 6, 5, 4, 3 } ) and d_keys is a 1D array of size 5 ( e.g. { 1, 0, 0, 1, 1 } ), how can I do a reduction such that I'd end up only adding values on a per-row basis if the corresponding d_keys value is one ( e.g. ending up with a result of { 10, 23, 14 } )?
The sum_rows.cu example allows me to add every value in d_data, but that's not quite right.
Alternatively, I can, on a per-row basis, use a zip_iterator and combine d_keys with one row of d_data at a time, and do a transform_reduce, adding only if the key value is one, but then I'd have to loop through the d_data array.
What I really need is some sort of transform_reduce_by_key functionality that isn't built-in, but surely there must be a way to make it!