Archives Discussions

KNeumann · ‎02-13-2012

Hi,

I am developing a kernel for signal processing which uses a lot of registers and I stumbled over something I am not able to explain.

My kernel uses local variables like

#define NDATA 64
#define NORDER 5
__kernel void myKernel( __global float2* inputData, __global float2* outputCoeff)
{
          float2 ef[NDATA];     // 128 * 32bit
          float2 eb[NDATA];     //  128 * 32bit
          float2 parcor[NORDER];  // 10 * 32bit
          float2 arc[NORDER+1];    // 12 * 32bit
          float2 arc_tmp[NORDER+1];   // 12 * 32bit
     ...

during the execution of the kernel some temporary variables are created but not many.

Now when I want to calculate how many 32bit registers the kernel uses I would say 290 plus some temporaries. Let's go with 300 32 bit regs.

I am running the kernel on an 6450 which has 16384 Vector Registers / CU available and each vector register is 128bit wide.

This would mean that each work-item needs 7 ( = 300 / 4) Vector Registers. So when I run the kernel with a work-group-size of say 5, I shouldn't see

any scratch register usage, because the CU has vector registers for much more than 5 work-items at once).

When I analyze this kernel with the Kernel-Analyzer the results are somewhat obscure to me:

GPRs-Used: 9

Scratch-Regs used: 70

This is what confuses me.

Did I get something wrong from the documentation or am I just too deep stuck in the details so that I'm not able to see the obvious flaw in my calculations?

I hope someone has some advice on this.

Thanks in advance

Archives Discussions

Usage of Registers in Kernel