cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Wibowit
Journeyman III

Designing Radix Sort for Radeon architectures

How to achieve maximum speed?

Hi,

 

I want to develop an efficient BWT implementation on Radeon GPGPU. For that I need an efficient sorting procedure. I've found Bitonic Sort and Radix Sort suitable to GPGPU.

 

Unfortunately it seems that both OpenCL implementations are slower than CUDA ones and Radeons are slower than GeForces at sorting:

http://unigine.blogspot.com/2010/09/cuda-vs-opencl-vs-spu-part-iv.html

http://code.google.com/p/back40computing/wiki/RadixSorting

http://encode.ru/threads/1208-CUDA-GPU-based-BWT-ST4-sorting?p=24570&viewfull=1#post24570

 

I'm thinking about some hybrid of Radix Sorting and Bitonic Sorting. What are your suggestions? Where should I start (re)searching?

 

And another question. Radeon's Stream Processors are grouped to 4 or 5-elements wide units. Are they able to load uncontiguous data from registers? Ie. one SP reads or writes some 32-bit value, in parallel second SP reads/ writes another 32-bit value that doesn't have adjacent address, in parallel third, fourth (and maybe fifth) SP from unit does the same thing.

Or, to put it in another perspective, would that code be executed in one instruction and memory operation would be executed paralelly?

a += array;

b += array;

c += array;

d += array;

 

Or another code:

array++;

array++;

array++;

array++;

 

What level of memory parallelism should I expect?

0 Likes
3 Replies
himanshu_gautam
Grandmaster

I guess it is difficult to predict the behaviour of compiler. The best solution would be to use the profiler or SKA to look for how many instruction clauses are created for your code. They can really help in parallelizing the code.

Thanks

0 Likes
vic20
Journeyman III

you can find my implementation of radix sort

http://code.google.com/p/ocl-radix-sort/

 

it works well for amd GPU and CPU

see also

http://hal.archives-ouvertes.fr/hal-00596730

 

I hope it could help... 

0 Likes