Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

Best way to use constants in an IL kernel

Question: say I have a larger number of constants which are known at compile time. What is the most efficient way of using them? I could declare them as literals in the kernel, or I could set up a constant buffer (provided there aren't too many).

On one hand the GPU should be able to issue one instruction every cycle, including the literal, to all thread processors simultaneously. But then again the code must come from some place and go over some bus. So will the increased code size slow things down?

On the other hand, access to the constant buffer will occur only after the instruction has been executed, so a certain latency will be introduced. But then again the code cache is fairly large and offers a relatively high bandwidth (here I assume that constants are broadcast to all thread processors, correct me if I'm wrong).

So what is better? Or are the two ways treated identical internally?


3 Replies

For constant performance, the ways of accessing from highest to lowest performance is as follows:
constant ptr w/ compile time constant
constant ptr w/ runtime constant for all threads
constant ptr w/ linear access from all threads
constant ptr w/ random access

Thanks, that was very helpful.

But is there a reasonable limit in number of literals for a kernel? At what number will the performance suffer?



Literals are embedded in the instruction itself, so there is no limit on the number of literals in theory. In practice it is limited to 16k unique literals in a compilation unit.