Get peak performance at Radeon 5870

Question asked by marcus-from-europe on Feb 4, 2014
Latest reply on Jun 9, 2014 by sudarshan

Hi guys,


I am new in this forum und I have to admit that I am not a very experienced programmer in terms of gpu programming. Nonetheless I want to write some code to achieve peak performance (tflop/s) of my radeon 5870 (which is 2.72 TFlops). Getting an easy start, I downloaded "FlopsCL" from Kamil Rocki (see


Running the benchmark tool I got 2.15 TFlops (using float4). Thats impressive but by far not peak performance. Thus, I fired up CodeXL / AMD AppAnalyzer. The results:

     KernelOccupancy = 100

     ALUBusy = 49.84%

     ALUPacking = 79.93%


How can I optimize the kernel code to get full peak performance? Above all, is 'peak performance' reachable, even in a synthetic test - or is this just a calculated number based on tech details? How can I get better ALUPacking (obviously there are only 4 ALUs utilized of the VLIW5-ALUs)?