I wonder if anyone ever encountered this problem. For its apparent simplicity (and usefulness), I've been unable to find any standard library implementations, or algorithms, or even a common name for it.

Suppose I have a long binary array of 0's and 1's. (Assume that the array is >1 MB long.) Most bits are 0's, but some are 1's. I need to generate a list of integers with coordinates of '1's in the array.

I have a rather kludgy solution (involving three kernel launches, local memory, calls to an undocumented AMD-specific function popcnt(), and a lot of conditionals), but I wonder if there's a known better way to do this.

We can help you if you copy your kernel code here.