Archives Discussions

FamilyGuy · ‎08-23-2010

Question: say I have a larger number of constants which are known at compile time. What is the most efficient way of using them? I could declare them as literals in the kernel, or I could set up a constant buffer (provided there aren't too many).

On one hand the GPU should be able to issue one instruction every cycle, including the literal, to all thread processors simultaneously. But then again the code must come from some place and go over some bus. So will the increased code size slow things down?

On the other hand, access to the constant buffer will occur only after the instruction has been executed, so a certain latency will be introduced. But then again the code cache is fairly large and offers a relatively high bandwidth (here I assume that constants are broadcast to all thread processors, correct me if I'm wrong).

So what is better? Or are the two ways treated identical internally?

Thx.

MicahVillmow · ‎08-23-2010

For constant performance, the ways of accessing from highest to lowest performance is as follows:
literals
constant ptr w/ compile time constant
constant ptr w/ runtime constant for all threads
constant ptr w/ linear access from all threads
constant ptr w/ random access

FamilyGuy · ‎08-23-2010

Thanks, that was very helpful.

But is there a reasonable limit in number of literals for a kernel? At what number will the performance suffer?

Thx.

MicahVillmow · ‎08-23-2010

Literals are embedded in the instruction itself, so there is no limit on the number of literals in theory. In practice it is limited to 16k unique literals in a compilation unit.

Archives Discussions

Best way to use constants in an IL kernel