9 Replies Latest reply on Mar 2, 2018 4:37 AM by dipak

    Strange OpenCL kernel behavior

    george72

      Hi, I'm working on a game-engine written in OpenCL/OpenGL and we're having several issues with getting our code to run properly on AMD GPU's. After comparing outputs from the code when run on nVidia versus AMD GPU's I noticed the following issue, which might seem to indicate an OpenCL compiler bug/problem:

       

      The problem looks something like this (this is just pseudo code to illustrate the issue):

       

      __kernel my_kernel(__global int* data1, __global int* data2, __global float4* quats1, __global float4* quats2, __global float4* xyz1, __global float4* xyz2)

      {

        int index = get_global_id(0);

        float4 quat;

        bool check = data1[index] != 0;

       

        // First if-statement

        if (check)

        {

          quat = quats1[index];

        }

        else

        {

          if (data2[index] == 0) return;

          quat = quats2[index];

        }

       

        // This block of code (not shown) modifies the quat & generates a position

        // ...

        // ...

        float3 position = (float3)(1,2,3);

       

        // Second if-statement

        if (check)

        {

          float3 xyz = xyz1[index];

          xyz += position;

          xyz1[index] = xyz;

       

          quats1[index] = quat;

        }

        else

        {

          float3 xyz = xyz2[index];

          xyz += position;

          xyz2[index] = xyz;

       

          quats2[index] = quat;

        }

      }

       

      The problem I'm encountering is that for every thread for which "check" is true, the results seem to indicate that it executed the "else" clause of the first if-statement but works correctly in the second if-statement.

       

      While trying to figure out what was going wrong, I added a few "printf" statements as follows:

       

      __kernel my_kernel(__global int* data1, __global int* data2, __global float4* quats1, __global float4* quats2, __global float4* xyz1, __global float4* xyz2)

      {

        int index = get_global_id(0);

        float4 quat;

        bool check = data1[index] != 0;

       

        // First if-statement

        if (check)

        {

          if (index == 255) printf("read1\n");
          quat = quats1[index];

        }

        else

        {

          if (index == 255) printf("read2\n");
          if (data2[index] == 0) return;

          quat = quats2[index];

        }

       

        // This block of code (not shown) modifies the quat & generates a position

        // ...

        // ...

        float3 position = (float3)(1,2,3);

       

        // Second if-statement

        if (check)

        {

          if (index == 255) printf("write1\n");
          float3 xyz = xyz1[index];

          xyz += position;

          xyz1[index] = xyz;

       

          quats1[index] = quat;

        }

        else

        {

          if (index == 255) printf("write2\n");
          float3 xyz = xyz2[index];

          xyz += position;

          xyz2[index] = xyz;

       

          quats2[index] = quat;

        }

      }

       

      But then the problem does not occur anymore!

      Splitting this kernel into 2 separate kernels for each path, also fixes the problem but I don't like this solution since the code only differs in where the data is read from/written to, so that's how I'd like to write it.

       

      Any ideas on what is happening here?

       

      I'm running on Windows 7 (SP1) x64, Radeon RX 460, Radeon software version 18.1.1 and am building the source with the -cl-std=CL1.2 flag.

      I have seen the effects of this issue in several previous driver versions as well.

       

      Cheers,

       

        George

        • Re: Strange OpenCL kernel behavior
          george72

          I did some more digging, and if I only add the first printf() statement, then the code works as expected.

          If I disable optimizations with -cl-opt-disable then the code also works as expected.

          It appears as though the optimizer makes a mistake when the code inside the if {} block is very small/trivial.

            • Re: Strange OpenCL kernel behavior
              dipak

              From your first post, I was suspecting the same that it might a compiler optimization issue. It's good that you've already confirmed the issue with -cl-opt-disable option.  Please share the repro kernel code so that I can open a ticket against it.

               

              [Edit]: Just for testing, please check once with the latest Adrenalin 18.2.2

                • Re: Strange OpenCL kernel behavior
                  george72

                  I'll try and isolate the issue and post the code here. I hope to have it ready sometime next week.

                  • Re: Strange OpenCL kernel behavior
                    george72

                    Testing with driver 18.2.2 shows the same behavior (unfortunately)

                     

                    It took some effort, but I managed to isolate the issue. Although I tried to keep it as simple as possible, I had to bring in the actual data from our application into it to trigger the undesired behavior. I'm not sure if it only happens for certain input combinations, but simply generating some data did not seem to trigger the bug.

                     

                    The .zip contains a C++ test project with CL source files and test data. I've tested it on Windows 7 & 10 (both 64-bit) with VS 2015 (64-bit) and building against OpenCL 1.2 headers.

                     

                    I hope this helps to pinpoint the issue, I would appreciate feedback if you guys find something.

                      • Re: Strange OpenCL kernel behavior
                        dipak

                        Thank you for sharing the repro.

                        While trying to build the project with AMD APP SDK, I was getting error against below lines in amd_test.cpp as there is no matching function found in the cl.hpp file. Did you use CL header files from APP SDK? If not, could you please try to build it with this and make the required changes?

                         

                         

                                   clog << "uploading input data" << endl;
                                    cl::Buffer buf_parameters           (context, parameters        .begin(), parameters        .end(), false, false, &result);  if (result != CL_SUCCESS) throw cl::Error(result, "clCreateBuffer() failed");
                                    cl::Buffer buf_error_count          (context, error_count       .begin(), error_count       .end(), false, false, &result);  if (result != CL_SUCCESS) throw cl::Error(result, "clCreateBuffer() failed");
                                    cl::Buffer buf_error_file           (context, error_file        .begin(), error_file        .end(), false, false, &result);  if (result != CL_SUCCESS) throw cl::Error(result, "clCreateBuffer() failed
                        
                        .......
                        
                        

                         

                        Regards,