AnsweredAssumed Answered

best strategy for many scattered reductions

Question asked by jason on Feb 14, 2015
Latest reply on Feb 23, 2015 by jason

I'm doing several statistics/reductions on a labeled image to compute bounding box, area, maximum and minimum values of related pixels in a source image (4 million pixels for both label and source, each).  The maximum number of labels in general is close to a 10th of that.  Each entry in the label image can occur in any distribution but you will often get large portions of the image concentrating on a few labels.


So far I have 2 implementations:

1 using global datastore and using atomic max/mins/incs.  This achieves 30 ms/frame (sucks!)


The other is in lds memory and assumes a small known upper bound and had the workgroup do those atomic reductions in lds.  Then the workgroup merges those to the gds via atomic maxs/mins/incs. Higher maximums will start to limit wavefronts and also increase the number of writes.  This achieves 2-3ms per frame with a maximum of 512 labels.


So the small upper bound is achievable in some situations but really I'd like to know if there's a better strategy.  I'd rather not just go exploring willy nilly more than I already have without checking if there's any suggestions on dealing with this kind of scattered atomic write/inc problem.


One idea I had was splitting things up spatially to distribute the atomic operations over different areas of memory.  This didn't help a bit though.  I tried varying with odd sizes the stride between elements also seeming to not effect anything.