cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

Dr_Haribo
Journeyman III

Distributing kernel binaries

The AMD OpenCL compiler seems to have changed a lot going from Catalyst 11.11 to 11.12. A kernel source that would compile into tightly packed VLIW ALU instruction groups on 11.11 is likely to have horrible performance on 11.12.

Might it be an idea to distribute kernel binaries (in addition to source) with my program in case compiling the source on the user's computer yields bad performance?

Is there a way to compile binaries for AMD GPUs I don't have? AMD APP KernelAnalyzer seems to do this, but I see no way to do it through the OpenCL API. Nor does there seem to be a way to save the different binaries the KernelAnalyzer makes. I have a 6990 and a 5970, but nothing from the 4000-series.

Is 3 binaries enough? VLIW5 on 4000-series, VLIW5 on 5000+, VLIW4. Or do I need more specific binaries than that?

Is there an easy way to match precompiled binaries to the user's GPUs? Anything other than going by the device name reported by OpenCL and mapping that to architecture myself?

 

0 Kudos
Reply
3 Replies
nou
Exemplar

Distributing kernel binaries

http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=115

you need more than 3 binaries. it is 3 binaries per generation.

i don't see any other way to match devices as with names. why it should be a problem?

also you should make test case and send it to AMD so they can look into that regression.

0 Kudos
Reply
MicahVillmow
Staff
Staff

Distributing kernel binaries

Yes, as nou said, send us a test case and we will look into it and see about getting it fixed.

Also, Using the offline devices extension, you can easily create binaries for each device. I would not recommend creating binaries that work on more than one device as an optimization in the future for specific devices might break if the device and the binary do not match.
0 Kudos
Reply
Dr_Haribo
Journeyman III

Distributing kernel binaries

Thanks for useful info on offline compilation!

For an example of performance degradation in the recent SDK, take a look at this:

https://bitcointalk.org/index.php?topic=25860.0

Notice the difference. HD5870 version compiled under Catalyst 11.7 yields a kernel with 1363 ALU instruction groups, and Catalyst 11.12 yields 1426 - 1400 after they tweaked it. I am having similar issues. It seems very difficult to get the latest compiler to compile anything well for VLIW5.

You can download the kernel from a link at the URL above and try for yourself with different versions of the AMD compiler.

 

0 Kudos
Reply