cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

roger512
Adept II

Sorting very small array (< 16) with a single thread

Hi,

I am looking for suggestions to resolve efficiently a sorting problem. I have to sort one  very small array per work item (thread)  (< 16 elements).

Maybe it is possible to save the array in a float16 and do it efficiently only using registers, but what kind of sequencial algorithm would fit in that case ?

Roger

0 Likes
1 Solution
himanshu_gautam
Grandmaster

Storing the variables in a private array and using "bubble sort" might help.

Just make sure that the FOR loops are unrolled (#pragma unroll)

And so... Constant indices are used to index the private array.

If you use a dynamically computed index, then compiler will shift the private array to global memory - resulting in a huge performance dip....

Please let us know if you solved this problem.

View solution in original post

0 Likes
4 Replies
himanshu_gautam
Grandmaster

Storing the variables in a private array and using "bubble sort" might help.

Just make sure that the FOR loops are unrolled (#pragma unroll)

And so... Constant indices are used to index the private array.

If you use a dynamically computed index, then compiler will shift the private array to global memory - resulting in a huge performance dip....

Please let us know if you solved this problem.

0 Likes

Ty for your answer that's what i did to resolve the problem, I went for a shell sort though.

I don't clearly understand what you mean about the difference of performance between private and global. I always though it was about the same latency.

Another question, if i'm running out of registers should i try to manually declare  private the less used variables in order to keep registers for variables often used?

0 Likes

Private Array is usually mapped to registers -- ONLY IF the indices your code uses to access the array can be determined at COMPILE TIME

Otherwise, the Private Array may actually be stored in a separate section in global memory reserved for that work-item... This can make things very very slow.

Say, you are using quicksort -- You need to first take an element and determine which position in the array it should belong to. This can be computed only at run-time and not at compile time. So if you use quicksort for this small private array, you will have huge performance penalties.

Hope I have made myself clear. Please let me know if you still have problems with it.

0 Likes

Ok thank you for the explanation, so it seems it can't work in my case, a sort will use dynamic addressing since swaps   depend on values encountered.

just reasking, can it be usefull to use private qualifier for variables that aren't used often if my kernel is running out of registers ?

0 Likes