cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

bubu
Adept II

radix vs bitonic sorting

I need to sort 64k values in form [key=32bits int,val=32bits int too].

What do you think it will be faster for ATI's GPUs? Radix sorting or bitonic sorting?

 

I see only a an example showing bitonic sorting in the SDK... what about radix sorting?

 

thx

0 Likes
4 Replies
himanshu_gautam
Grandmaster

Radix sort is also present in the samples. I think you can see the docs present with the sample to setermine which algorithm is more suitable for your case.

Radix sort complexity kN

bitonic sort complexity logN*logN*N

0 Likes

An efficient radix sort isn't in the samples unless someone put my version in there, but I don't think anybody fixed it up for public consumption. I keep meaning to.

NVidia's samples have a fairly good implementation of a radix sort, though not the best available. Mark Harris published a couple of fairly good papers on the subject hence it being in there.

Radix sort is O(NM) in theory, but you need a sensible implementation to efficiently use the memory system. I think for a 64k sort radix sort should be fastest, but I could be wrong because that sounds like a fairly borderline case for a 20-odd core chip like the 5870. It would probably depend on your key size, as well.

0 Likes

Originally posted by: LeeHowes An efficient radix sort isn't in the samples unless someone put my version in there, but I don't think anybody fixed it up for public consumption. I keep meaning to.

 

Will be fantastic to see both bitonic and radix examples in your SDK pls!

A think that scares me a bit is the fact that I could need a lot of passes/kernel calls. As 64k elements is relatively small, if the kernel calls are expensive the CPU might win.

 

0 Likes

Oh, definitely. I was listening to a talk on this subject at the HiPEAC conference only this week. They were combining sorts at different sizes to get the overall peak performance.

I would intuitively think that 64k would be too small a sort to cover up/download overhead. Which means I wouldn't expect the GPU to win if the data started on the CPU, nor the CPU win if the data were on the GPU to start with.

The GPU really isn't that much faster than the CPU, after all.

0 Likes