AnsweredAssumed Answered

Estimate how a kernel will take GPU ressources from KernelAnalyser2 info VS clinfo data ?

Question asked by twintip31 on May 29, 2013
Latest reply on May 30, 2013 by twintip31

Hi,

 

I programed a quite long kernel and it seems I cannot execute more than 1024 of it in parallel on my Tahiti card. I have a CL OUT OF RESSOURCE error code if my global_work_size parameter is > 1024 !! and I dont understand why....

 

Tahiti has work group size of 256 and 28 compute units, so basically with a very simple kernel using minimum set of ALUs, does it mean I would expect to run at least 28x256 times the same kernel accross the GPU at same moment ? If then I take into account the wavefront concept, I can run more work items.....

 

I understand that if my kernel is too big in register usage, I would probably reduce the ressources and thus reduce the ressources available to execute my kernel in "parallel" accross all the work items available and compute units available.

 

Is there a way with those figures above about Tahiti and the output data from KernelAnalyser2 to predict how many times I can run my kernel code in "parallel" using all the GPU ressources ?

 

Here is an example of data generated from KernelAnalyser2 for my code (I only extracted non-0 values here...) :

; ----------------- CS Data ------------------------

codeLenInByte        = 2596;Bytes

 

userElementCount     = 3;

;  userElements[0]    = PTR_CONST_BUFFER_TABLE

, 0

, s[2:3]

;  userElements[1]    = IMM_UAV

, 10

, s[4:7]

;  userElements[2]    = IMM_UAV

, 11

, s[8:11]

extUserElementCount  = 0;

NumVgprs             = 27;

NumSgprs             = 24;

FloatMode            = 192;

IeeeMode             = 0;

ScratchSize          = 0;

 

;COMPUTE_PGM_RSRC2       = 0x00000098

COMPUTE_PGM_RSRC2:USER_SGPR      = 12

COMPUTE_PGM_RSRC2:TGID_X_EN      = 1


Here is the codeXL analysis on Kernel Occupancy:


kernelOccupancyAnalysis.jpg

Outcomes