Jan 29, 2010 3:11 PM by ryta1203

    Incredible Radix Sort Performance on HD5850 - crushes CUDPP


      I've been working on a DX11 sort for a while.  It's finally working, and working very very well. I'm seeing 329 million pairs/sec for deinterleaved key/value pairs, 279 million pairs/sec for interleaved pairs, and 408 million uints/sec for keys alone.  By comparison, this rather authoritative paper ( http://mgarland.org/files/papers/gpusort-ipdps09.pdf) reports only 145 million pairs/sec on GTX 280 using CUDPP.  I have a more complete write-up here.  I would be interested in comments.