Storing the variables in a private array and using "bubble sort" might help.
Just make sure that the FOR loops are unrolled (#pragma unroll)
And so... Constant indices are used to index the private array.
If you use a dynamically computed index, then compiler will shift the private array to global memory - resulting in a huge performance dip....
Please let us know if you solved this problem.
Ty for your answer that's what i did to resolve the problem, I went for a shell sort though.
I don't clearly understand what you mean about the difference of performance between private and global. I always though it was about the same latency.
Another question, if i'm running out of registers should i try to manually declare private the less used variables in order to keep registers for variables often used?
Private Array is usually mapped to registers -- ONLY IF the indices your code uses to access the array can be determined at COMPILE TIME
Otherwise, the Private Array may actually be stored in a separate section in global memory reserved for that work-item... This can make things very very slow.
Say, you are using quicksort -- You need to first take an element and determine which position in the array it should belong to. This can be computed only at run-time and not at compile time. So if you use quicksort for this small private array, you will have huge performance penalties.
Hope I have made myself clear. Please let me know if you still have problems with it.
Ok thank you for the explanation, so it seems it can't work in my case, a sort will use dynamic addressing since swaps depend on values encountered.
just reasking, can it be usefull to use private qualifier for variables that aren't used often if my kernel is running out of registers ?