AnsweredAssumed Answered

Usage of Registers in Kernel

Question asked by KNeumann on Feb 13, 2012
Latest reply on Feb 17, 2012 by gat3way

Hi,

 

I am developing a kernel for signal processing which uses a lot of registers and I stumbled over something I am not able to explain.

My kernel uses local variables like

#define NDATA 64
#define NORDER 5

__kernel void myKernel( __global float2* inputData, __global float2* outputCoeff)
{
          float2 ef[NDATA];     // 128 * 32bit
          float2 eb[NDATA];     //  128 * 32bit
          float2 parcor[NORDER];  // 10 * 32bit
          float2 arc[NORDER+1];    // 12 * 32bit
          float2 arc_tmp[NORDER+1];   // 12 * 32bit
     ...

 

during the execution of the kernel some temporary variables are created but not many.

Now when I want to calculate how many 32bit registers the kernel uses I would say 290 plus some temporaries. Let's go with 300 32 bit regs.

I am running the kernel on an 6450 which has 16384 Vector Registers / CU available and each vector register is 128bit wide.

This would mean that each work-item needs 7 ( = 300 / 4) Vector Registers. So when I run the kernel with a work-group-size of say 5, I shouldn't see

any scratch register usage, because the CU has vector registers for much more than 5 work-items at once).

 

When I analyze this kernel with the Kernel-Analyzer the results are somewhat obscure to me:

GPRs-Used: 9

Scratch-Regs used: 70

 

This is what confuses me.

Did I get something wrong from the documentation or am I just too deep stuck in the details so that I'm not able to see the obvious flaw in my calculations?

 

I hope someone has some advice on this.

 

Thanks in advance

Outcomes