Question: say I have a larger number of constants which are known at compile time. What is the most efficient way of using them? I could declare them as literals in the kernel, or I could set up a constant buffer (provided there aren't too many).
On one hand the GPU should be able to issue one instruction every cycle, including the literal, to all thread processors simultaneously. But then again the code must come from some place and go over some bus. So will the increased code size slow things down?
On the other hand, access to the constant buffer will occur only after the instruction has been executed, so a certain latency will be introduced. But then again the code cache is fairly large and offers a relatively high bandwidth (here I assume that constants are broadcast to all thread processors, correct me if I'm wrong).
So what is better? Or are the two ways treated identical internally?
Thx.