cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

BarnacleJunior
Journeyman III

Incredible Radix Sort Performance on HD5850 - crushes CUDPP

I've been working on a DX11 sort for a while.  It's finally working, and working very very well. I'm seeing 329 million pairs/sec for deinterleaved key/value pairs, 279 million pairs/sec for interleaved pairs, and 408 million uints/sec for keys alone.  By comparison, this rather authoritative paper ( http://mgarland.org/files/papers/gpusort-ipdps09.pdf) reports only 145 million pairs/sec on GTX 280 using CUDPP.  I have a more complete write-up here.  I would be interested in comments.

http://forums.xna.com/forums/p/46766/279871.aspx#279871

 

.sean

0 Likes
2 Replies
eduardoschardong
Journeyman III

Congratulations!

 

Good job, I'm really impressed! Never thought it could be that fast.

 

0 Likes

Good job,

  Though I've seen better radix sorts for CUDA than the one you linked to. I wouldn't put much credibiliity in any "published" papers since the publication process is so broken (I'm sure most of the academic community would disagree with me here, of course). Also, most of the CUDA libraries are not that efficient. For example, I've seen great speeds up FFT on CUDA when compared to CUFFT.

0 Likes