I was wondering whether there is an operation that does split operation like the one offered in the CUDA library. I operation works on a work group and sort the items in a local (work group) memory object by splitting the items that have 0's in the lower part of memory and items with 1's in the upper part. You can have a look at this page for more info: GPU Gems 3 - Chapter 39. Parallel Prefix Sum (Scan) with CUDA (just search for "split" in that page)
Are there any alternatives to this whether in the OpenCL standard of AMD's implementation?