4 Replies Latest reply on Apr 18, 2013 6:06 AM by roger512

    Sorting very small array (< 16)  with a single thread

    roger512

      Hi,

       

      I am looking for suggestions to resolve efficiently a sorting problem. I have to sort one  very small array per work item (thread)  (< 16 elements).

       

      Maybe it is possible to save the array in a float16 and do it efficiently only using registers, but what kind of sequencial algorithm would fit in that case ?

       

      Roger

        • Re: Sorting very small array (< 16)  with a single thread
          himanshu.gautam

          Storing the variables in a private array and using "bubble sort" might help.

           

          Just make sure that the FOR loops are unrolled (#pragma unroll)

          And so... Constant indices are used to index the private array.

          If you use a dynamically computed index, then compiler will shift the private array to global memory - resulting in a huge performance dip....

           

          Please let us know if you solved this problem.

            • Re: Sorting very small array (< 16)  with a single thread
              roger512

              Ty for your answer that's what i did to resolve the problem, I went for a shell sort though.

               

              I don't clearly understand what you mean about the difference of performance between private and global. I always though it was about the same latency.

               

              Another question, if i'm running out of registers should i try to manually declare  private the less used variables in order to keep registers for variables often used?

                • Re: Sorting very small array (< 16)  with a single thread
                  himanshu.gautam

                  Private Array is usually mapped to registers -- ONLY IF the indices your code uses to access the array can be determined at COMPILE TIME

                  Otherwise, the Private Array may actually be stored in a separate section in global memory reserved for that work-item... This can make things very very slow.

                  Say, you are using quicksort -- You need to first take an element and determine which position in the array it should belong to. This can be computed only at run-time and not at compile time. So if you use quicksort for this small private array, you will have huge performance penalties.

                  Hope I have made myself clear. Please let me know if you still have problems with it.