AnsweredAssumed Answered

Kernel performance variability with the same code and same data

Question asked by digbug on Jul 18, 2013
Latest reply on Jul 30, 2013 by himanshu.gautam

Platform:

OS: win7 64bit

SDK: APP SDK 2.8.1

CodeXL: latested

Driver: Catelyst 13.4

 

Problems:

100 times kernel execution with same input data, there are 2-3 Kernel (besides the first one) execution time is significantly larger than others. CodeXL reports allmost the same GPU behavior.

Would like to get help on how to fix that, is that a runtime problem or memory access pattern or something else?

 

Sample code:

"

__constant sampler_t imageSampler  = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;
__constant sampler_t imageSampler2 = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_LINEAR;

__kernel void calBilinearImage(__write_only image2d_t dstimg, __read_only image2d_t srcimg, int2 srcSize, float2 rate,int2 dstSize)
{
float2 gid = (float2)(get_global_id(0), get_global_id(1));
if (gid.x >= dstSize.x || gid.y >= dstSize.y) 
  return;

float4 temp = clamp(read_imagef(srcimg, imageSampler2, (gid + 0.4999999f) * rate), 0.0f, 1.0f);
write_imagef(dstimg, (int2)(gid.x, gid.y), temp);

};

"

 

CodeXL reports:

calBilinearImage__k1_Turks16475664 {   1280     960       1} {   16    16     1}0.8351102 NA011920017
calBilinearImage__k1_Turks17475673 {   1280     960       1} {   16    16     1}0.83402 NA011920017
calBilinearImage__k1_Turks18475682 {   1280     960       1} {   16    16     1}3.1394402 NA011920017
calBilinearImage__k1_Turks19475691 {   1280     960       1} {   16    16     1}0.8157802 NA011920017
calBilinearImage__k1_Turks1104756100 {   1280     960       1} {   16    16     1}0.8134402 NA011920017

Outcomes