I want to swap things like this:
I want to swap data[idx] to data that has index smaller or higher than data[idx], is there anyway to do this?
One more. Where is array is stored? GPU or CPU?
All the streams reside in GPU memory. You can access stream elements only inside kernel. Kernel doesn't allow read and write from the same stream, so swapping the elements within a stream won't be possible.
Is there anyway to tackle this?
Now I begin to ask what people usually do by the means of GPGPU computing if it is not really GP.
It's a "streaming" environment so it has limitations other GPGPU solutions do not have (I don't think OpenCL will have the read/write limitation and that should be out soon, CUDA doesn't have this limitation either).
That said, you can just create two streams: an input stream and an output stream. This is how you "handle" it. This would be the same for any instance where you wanted to input an array (stream) and manipulate that stream and then output it. You just have to be careful, if you don't manipulate every element of the stream in the kernel with the algorithm, then you need to explicitly copy those elements over (or every element at the start of the kernel, I haven't really noticed a difference between the two).
This really only becomes a problem if you start to run out of memory and not even then really considering the max stream size is 8192x8192.
Retrieving data ...