What is the addressable unit of registers? A byte or a 4-byte integer/float, or a 16-byte vector?
If I declare a byte type private variable, will it occupy a 1-byte register, or 4-, or 16-byte register?
How are registers used by the ALU? In the following statements,
float4 x;
x.a = 1.f; // writing to the first element of x
float y = x.b + 2.f; // reading the second element of x
can the last two instructions be excuted simultaneously?
Thank you in advance!
Solved! Go to Solution.
1) A character occupies a 4 bytes of 16 byte registers.
2) They may be packed into 4 components of up to 4 different registers.
3) No, what is in a vector in the source does not directly correspond to what is in a vector register.
Registers can be thought of as 16 bytes of memory that are addressable as 4 4-byte components on the VLIW architecture.
On GCN architecture, a scalar register 4 bytes and a vector register is 64x4 bytes.
Thank you Micah.
What is GCN?
On Radeon 5870, if I declare a float variable and the compiler decides to allocate a register for it, will the variable occupy a whole 16-byte vector? Will the compiler consider to compress multiple float varibles into a 4-element vector?
I found it is hard to predict when the compiler will allocate a register. I think there are lots of optimizations inside the compiler.
GCN stands for graphics core next and is our architecture for HD7XXX chips.
The compiler will pack registers when it can, but if too many registers are required or the index pattern into an array is dynamic and the array is above a certain threshhold, then the registers get backed into memory.
Hi Micah,
I only know that 5870 has 256kb vector registers. Does it also have separate scalar registers?
Do you mean that any array declared as a private variable, and indexed dynamically will be pushed into the global memory? An example is as below:
__kernel void mykernel{
int array[10];
int total;
for (int i = 0; i < 10; i ++)
total += array; // array is indexed dynamically, and will be put into global memory, and cause the kernel super slow???
}
That is a statically indexed array as the indices can be determined at compile time.
The only way this gets pushed into global uncached memory is if the array is to large to fit into regsiters.
Only the HD7XXX series has scalar registers.
Hi Micah, I don't understand the phrase "addressable as 4 4-byte components".
Are the registers indexed in a linear space, like WORD0, WORD1, WORD2, WORD3, WORD4, ...
Or are they indexed in a two-layer space, like VECTOR0.x, VECTOR0.y, VECTOR0.z, VECTOR0.w, VECTOR1.x, ...
Registers are precious resource on GPUs. I want to (1) save some registers (2) speed up the calculations. Could you kindly answer yes or no to the following questions?
(1) a char variable occupies a 16bit register
(2) the following variables may be packed into a float4 by the compiler:
float x;
float3 y;
(3) accessing to different components of a vector register will be serialized, and slow down the calculation. Eg.
float4 v;
v.x = 1.f; // writing to the first element of x
float y = v.y + 2.f; // reading the second element of x
Because v.x and v.y are in the same vector register, there is an access conflict?
Thank you very much in advance!
1) A character occupies a 4 bytes of 16 byte registers.
2) They may be packed into 4 components of up to 4 different registers.
3) No, what is in a vector in the source does not directly correspond to what is in a vector register.
Thank you very much Micah!
May I ask how is registers addressed? In 1D space like WORD0, WORD1, WORD2, WORD3, WORD4, ..., or 2D like VECTOR0.x, VECTOR0.y, VECTOR0.z, VECTOR0.w, VECTOR1.x, ...?
Sorry for so many questions. I don't understand GPU asmbly
This information can be found in our ISA documents here:
http://developer.amd.com/gpu_assets/R700-Family_Instruction_Set_Architecture.pdf
http://developer.amd.com/sdks/amdappsdk/assets/AMD_Evergreen-Family_Instruction_Set_Architecture.pdf
Look at the sections on data flow of registers.
Hi Micah,
Thank you very much for the useful references! I will read them when I get some time.
Vis Cocoa