AnsweredAssumed Answered

Strange OpenCL kernel behavior

Question asked by george72 on Feb 15, 2018
Latest reply on Mar 2, 2018 by dipak

Hi, I'm working on a game-engine written in OpenCL/OpenGL and we're having several issues with getting our code to run properly on AMD GPU's. After comparing outputs from the code when run on nVidia versus AMD GPU's I noticed the following issue, which might seem to indicate an OpenCL compiler bug/problem:

 

The problem looks something like this (this is just pseudo code to illustrate the issue):

 

__kernel my_kernel(__global int* data1, __global int* data2, __global float4* quats1, __global float4* quats2, __global float4* xyz1, __global float4* xyz2)

{

  int index = get_global_id(0);

  float4 quat;

  bool check = data1[index] != 0;

 

  // First if-statement

  if (check)

  {

    quat = quats1[index];

  }

  else

  {

    if (data2[index] == 0) return;

    quat = quats2[index];

  }

 

  // This block of code (not shown) modifies the quat & generates a position

  // ...

  // ...

  float3 position = (float3)(1,2,3);

 

  // Second if-statement

  if (check)

  {

    float3 xyz = xyz1[index];

    xyz += position;

    xyz1[index] = xyz;

 

    quats1[index] = quat;

  }

  else

  {

    float3 xyz = xyz2[index];

    xyz += position;

    xyz2[index] = xyz;

 

    quats2[index] = quat;

  }

}

 

The problem I'm encountering is that for every thread for which "check" is true, the results seem to indicate that it executed the "else" clause of the first if-statement but works correctly in the second if-statement.

 

While trying to figure out what was going wrong, I added a few "printf" statements as follows:

 

__kernel my_kernel(__global int* data1, __global int* data2, __global float4* quats1, __global float4* quats2, __global float4* xyz1, __global float4* xyz2)

{

  int index = get_global_id(0);

  float4 quat;

  bool check = data1[index] != 0;

 

  // First if-statement

  if (check)

  {

    if (index == 255) printf("read1\n");
    quat = quats1[index];

  }

  else

  {

    if (index == 255) printf("read2\n");
    if (data2[index] == 0) return;

    quat = quats2[index];

  }

 

  // This block of code (not shown) modifies the quat & generates a position

  // ...

  // ...

  float3 position = (float3)(1,2,3);

 

  // Second if-statement

  if (check)

  {

    if (index == 255) printf("write1\n");
    float3 xyz = xyz1[index];

    xyz += position;

    xyz1[index] = xyz;

 

    quats1[index] = quat;

  }

  else

  {

    if (index == 255) printf("write2\n");
    float3 xyz = xyz2[index];

    xyz += position;

    xyz2[index] = xyz;

 

    quats2[index] = quat;

  }

}

 

But then the problem does not occur anymore!

Splitting this kernel into 2 separate kernels for each path, also fixes the problem but I don't like this solution since the code only differs in where the data is read from/written to, so that's how I'd like to write it.

 

Any ideas on what is happening here?

 

I'm running on Windows 7 (SP1) x64, Radeon RX 460, Radeon software version 18.1.1 and am building the source with the -cl-std=CL1.2 flag.

I have seen the effects of this issue in several previous driver versions as well.

 

Cheers,

 

  George

Outcomes