Hi everyone,
Recently, I started to analysing the execution time of different algorithms on GPGPU. But I found the results of sample "BitonicSort" in SDK are much larger than the results I saw from some papers and documents. For instance, the kernel time of sorting a array with lenth 8388608 is 27s on GPU, and 46s on CPU(speedup is just approximately 2), but only 8s on STL. I was so confused that paralell sorting on GPU is such slower than STL. Could anyone tell me why? Bty, I also don't understand why the complexity of this sample, which is mentioned in the sample's doc, is O( N * log2(N) * log2(N) ), not ( log2(N) * log2(N) ). And I'll be very happy if someone can provide me a better design of Bitonic Sort.
My computer is a laptop with Intel Core i7 and ATI Mobility Radeon HD 5870.