Incredible Radix Sort Performance on HD5850 - crushes CUDPP

Discussion created by BarnacleJunior on Jan 28, 2010
Latest reply on Jan 29, 2010 by ryta1203

I've been working on a DX11 sort for a while.  It's finally working, and working very very well. I'm seeing 329 million pairs/sec for deinterleaved key/value pairs, 279 million pairs/sec for interleaved pairs, and 408 million uints/sec for keys alone.  By comparison, this rather authoritative paper ( reports only 145 million pairs/sec on GTX 280 using CUDPP.  I have a more complete write-up here.  I would be interested in comments.