I am not sure whether it's the most efficient solution.But here it is:
First use a parallel sorting algorithm(e.g Radix Sort) to sort the array.
After that lauch an openCL Kernel having dimension that of the array.Each thread needs to check 3 elements(except corner threads) :first with index same as the thread and the others just behind and ahead of this element.If any two/three of these elements are identical that element can be discarded.If all the three consecutive elements are different then the middle one is definitely unique.
Sorry for replying this late to a year old question. I appreciate you taking time for the solution, but I solved it in a different way. My way might be a little faster, but I have to benchmark.
The reason I needed the unique member thing is coz I was trying to devise a way to classify stuff using an integer key. I could straightforwardly sort it, but I found a better solution.
What I am doing is this:
I take the key of the first element of the array. Then, I run a comparison and store the result. ie, if it is equal to the first element, I store 1. Esle I store 0. Now, I do a parallel scan on the result. This is really fast in CUDA using population count, but we dont have that feature in Opencl. I also do a parallel scan inverting the bits of the result.
Now, using the result of the first scan, I bin all elements equal to the first element in one array, and using the result of the second scan, I bin all elements not eual to the first element in the second array.
I repeat the process on second array, until no elements remain.
This solves the binning problem. Classifying elements this way is much faster than full radix sort if there are only a few unique keys.
The number of prefix sum iterations required is equal to twice the number of unique keys present in the array. However, for radix sort, its always equal to the number of bits in the key, which in my case was much larger. Also, the number of bits that need to be prefix summed reduces with each iteration in my algorithm, but stays constant(equal to the number of elements) in radix sort.