cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

george72
Adept I

Strange OpenCL kernel behavior

Hi, I'm working on a game-engine written in OpenCL/OpenGL and we're having several issues with getting our code to run properly on AMD GPU's. After comparing outputs from the code when run on nVidia versus AMD GPU's I noticed the following issue, which might seem to indicate an OpenCL compiler bug/problem:

The problem looks something like this (this is just pseudo code to illustrate the issue):

__kernel my_kernel(__global int* data1, __global int* data2, __global float4* quats1, __global float4* quats2, __global float4* xyz1, __global float4* xyz2)

{

  int index = get_global_id(0);

  float4 quat;

  bool check = data1[index] != 0;

  // First if-statement

  if (check)

  {

    quat = quats1[index];

  }

  else

  {

    if (data2[index] == 0) return;

    quat = quats2[index];

  }

  // This block of code (not shown) modifies the quat & generates a position

  // ...

  // ...

  float3 position = (float3)(1,2,3);

  // Second if-statement

  if (check)

  {

    float3 xyz = xyz1[index];

    xyz += position;

    xyz1[index] = xyz;

    quats1[index] = quat;

  }

  else

  {

    float3 xyz = xyz2[index];

    xyz += position;

    xyz2[index] = xyz;

    quats2[index] = quat;

  }

}

The problem I'm encountering is that for every thread for which "check" is true, the results seem to indicate that it executed the "else" clause of the first if-statement but works correctly in the second if-statement.

While trying to figure out what was going wrong, I added a few "printf" statements as follows:

__kernel my_kernel(__global int* data1, __global int* data2, __global float4* quats1, __global float4* quats2, __global float4* xyz1, __global float4* xyz2)

{

  int index = get_global_id(0);

  float4 quat;

  bool check = data1[index] != 0;

  // First if-statement

  if (check)

  {

    if (index == 255) printf("read1\n");
    quat = quats1[index];

  }

  else

  {

    if (index == 255) printf("read2\n");
    if (data2[index] == 0) return;

    quat = quats2[index];

  }

  // This block of code (not shown) modifies the quat & generates a position

  // ...

  // ...

  float3 position = (float3)(1,2,3);

  // Second if-statement

  if (check)

  {

    if (index == 255) printf("write1\n");
    float3 xyz = xyz1[index];

    xyz += position;

    xyz1[index] = xyz;

    quats1[index] = quat;

  }

  else

  {

    if (index == 255) printf("write2\n");
    float3 xyz = xyz2[index];

    xyz += position;

    xyz2[index] = xyz;

    quats2[index] = quat;

  }

}

But then the problem does not occur anymore!

Splitting this kernel into 2 separate kernels for each path, also fixes the problem but I don't like this solution since the code only differs in where the data is read from/written to, so that's how I'd like to write it.

Any ideas on what is happening here?

I'm running on Windows 7 (SP1) x64, Radeon RX 460, Radeon software version 18.1.1 and am building the source with the -cl-std=CL1.2 flag.

I have seen the effects of this issue in several previous driver versions as well.

Cheers,

  George

0 Likes
9 Replies
george72
Adept I

I did some more digging, and if I only add the first printf() statement, then the code works as expected.

If I disable optimizations with -cl-opt-disable then the code also works as expected.

It appears as though the optimizer makes a mistake when the code inside the if {} block is very small/trivial.

0 Likes

From your first post, I was suspecting the same that it might a compiler optimization issue. It's good that you've already confirmed the issue with -cl-opt-disable option.  Please share the repro kernel code so that I can open a ticket against it.

[Edit]: Just for testing, please check once with the latest Adrenalin 18.2.2

0 Likes

I'll try and isolate the issue and post the code here. I hope to have it ready sometime next week.

0 Likes

Testing with driver 18.2.2 shows the same behavior (unfortunately)

It took some effort, but I managed to isolate the issue. Although I tried to keep it as simple as possible, I had to bring in the actual data from our application into it to trigger the undesired behavior. I'm not sure if it only happens for certain input combinations, but simply generating some data did not seem to trigger the bug.

The .zip contains a C++ test project with CL source files and test data. I've tested it on Windows 7 & 10 (both 64-bit) with VS 2015 (64-bit) and building against OpenCL 1.2 headers.

I hope this helps to pinpoint the issue, I would appreciate feedback if you guys find something.

0 Likes

Thank you for sharing the repro.

While trying to build the project with AMD APP SDK, I was getting error against below lines in amd_test.cpp as there is no matching function found in the cl.hpp file. Did you use CL header files from APP SDK? If not, could you please try to build it with this and make the required changes?

           clog << "uploading input data" << endl;

            cl::Buffer buf_parameters           (context, parameters        .begin(), parameters        .end(), false, false, &result);  if (result != CL_SUCCESS) throw cl::Error(result, "clCreateBuffer() failed");

            cl::Buffer buf_error_count          (context, error_count       .begin(), error_count       .end(), false, false, &result);  if (result != CL_SUCCESS) throw cl::Error(result, "clCreateBuffer() failed");

            cl::Buffer buf_error_file           (context, error_file        .begin(), error_file        .end(), false, false, &result);  if (result != CL_SUCCESS) throw cl::Error(result, "clCreateBuffer() failed

.......

Regards,

0 Likes

Hi, this new version has been tested against the APP SDK 3.0 OpenCL header files.

0 Likes

Thank you, George. The new version seems building fine. I'll open a ticket against it.

0 Likes

I found at least one other instance of what seems to be this same bug in our project where the code executes an if-statement (without else clause) even though the condition is actually false. Disabling optimizations seems to work sometimes but not always. Adding a printf() call seems to resolve the issue reliably but I'm currently looking for a work around without using printf(). I was wondering if there is something that is done inside the printf() that I can use instead of calling printf() to get the same effect?

0 Likes

I've already opened a ticket against the issue that was reported earlier. Let the team investigates the issue and finds the reason for this behavior. Once I've any update, I'll share with you.

Regards,

0 Likes