I was wondering whether there is an operation that does split operation like the one offered in the CUDA library. I operation works on a work group and sort the items in a local (work group) memory object by splitting the items that have 0's in the lower part of memory and items with 1's in the upper part. You can have a look at this page for more info: GPU Gems 3 - Chapter 39. Parallel Prefix Sum (Scan) with CUDA (just search for "split" in that page)
Are there any alternatives to this whether in the OpenCL standard of AMD's implementation?
I am now implementing split since I guess it is not available by default from the standard or AMD's implementation. However, I was wondering, is there a scatter function? I need to permute the contents of a local (work group) memory object based on a mask index array. Something similar to vector "shuffle" function but for general arrays.