Hi,
I have a problem with the register usage of one of my kernels again.
Now I have a kernel which needs roughly 2500 Byte register space.
I am running on a Radeon HD6450 where each Compute Unit has 256 KByte of register space available.
The kernel has the following line before the actual __kernel... definition:
__attribute__((reqd_work_group_size(64, 1, 1))).
My aim here is to let the compiler use the maximum number of registers, because I will execute the kernel with only 128 work-items, or 2 wavefronts. And each wavefront should run on one compute unit.
The problem is now, that the kernel uses spilled registers, which it shouldn't as far as I can see.
Because:
- 2500 Byte register space per work-item
- 64 work-items per wavefront
- gives: 160'000 Byte register space per wavefront
and with 256 KByte available per compute-unit, there is more than enough, so I don't understand the spill
Can anyone point out where I lie wrong?
Thanks