Using Catalyst 11.3 and APP SDK 4.0, everything goes as waited.
When using Catalyst 11.4 or newer:
Radix sort from NVIDIA SDK became a little faster, but Bitonic Sort and atomic operations (global atomic inc) appear worse.
Anyone saw that?
(bitonic sort from particles demo)
radeon 5870, windows vista 64, service pack 2