4 Replies Latest reply on Jan 26, 2011 7:14 PM by LeeHowes

    radix vs bitonic sorting

    bubu

      I need to sort 64k values in form [key=32bits int,val=32bits int too].

      What do you think it will be faster for ATI's GPUs? Radix sorting or bitonic sorting?

       

      I see only a an example showing bitonic sorting in the SDK... what about radix sorting?

       

      thx

        • radix vs bitonic sorting
          himanshu.gautam

          Radix sort is also present in the samples. I think you can see the docs present with the sample to setermine which algorithm is more suitable for your case.

          Radix sort complexity kN

          bitonic sort complexity logN*logN*N

            • radix vs bitonic sorting
              LeeHowes

              An efficient radix sort isn't in the samples unless someone put my version in there, but I don't think anybody fixed it up for public consumption. I keep meaning to.

              NVidia's samples have a fairly good implementation of a radix sort, though not the best available. Mark Harris published a couple of fairly good papers on the subject hence it being in there.

              Radix sort is O(NM) in theory, but you need a sensible implementation to efficiently use the memory system. I think for a 64k sort radix sort should be fastest, but I could be wrong because that sounds like a fairly borderline case for a 20-odd core chip like the 5870. It would probably depend on your key size, as well.

                • radix vs bitonic sorting
                  bubu

                   

                  Originally posted by: LeeHowes An efficient radix sort isn't in the samples unless someone put my version in there, but I don't think anybody fixed it up for public consumption. I keep meaning to.

                   

                  Will be fantastic to see both bitonic and radix examples in your SDK pls!

                  A think that scares me a bit is the fact that I could need a lot of passes/kernel calls. As 64k elements is relatively small, if the kernel calls are expensive the CPU might win.

                   

                    • radix vs bitonic sorting
                      LeeHowes

                      Oh, definitely. I was listening to a talk on this subject at the HiPEAC conference only this week. They were combining sorts at different sizes to get the overall peak performance.

                      I would intuitively think that 64k would be too small a sort to cover up/download overhead. Which means I wouldn't expect the GPU to win if the data started on the CPU, nor the CPU win if the data were on the GPU to start with.

                      The GPU really isn't that much faster than the CPU, after all.