15 Replies Latest reply on Sep 26, 2017 6:13 AM by dipak

    [BUG] local variables overwritten in OpenCL kernel

    dstarke

      I have a custom OpenCL kernel with a while loop and various local variables.

      These variables are sometimes overwritten (sometimes with 0, sometimes with NaN) when returning to the beginning of the loop.

      The issue is reproducible when using the same input values.

      The kernel works just fine with other vendors, thus I suspect a compiler bug.

      I have tested the issue on the following systems:

      - AMD Radeon HD 7800 Series

      Driver Version 22.19.162.4

      Windows 10 Education (Version 10.0.14393)

      - AMD Radeon R9 200 Series

      Driver Version 22.19.162.4

      Windows 10 Pro 64-Bit (Version 1607)

      - AMD Radeon 5800 Series

      Driver Version 15.200.1062.1004

      Windows 7 Home Premium (Version 6.1.7601 SP1 Build 7601)

       

      I can provide the kernel in source and binary (AMD Radeon HD 7800 Series) if required, but preferable not public.

      This would also include a Windows application to reproduce the issue and example outputs from other vendors.

       

      Please let me also know if this is the right place or where I should address this issue.

        • Re: [BUG] local variables overwritten in OpenCL kernel
          dipak

          Hi Daniel,

          Do you observe the same issue on the latest AMD driver as well? If yes, please share the repro code and system details so our team could investigate it here. I've doubt that the concerned team will accept any issue generating on a custom kernel.

          Btw, you have been whitelisted now.

           

          Regards,

            • Re: [BUG] local variables overwritten in OpenCL kernel
              dstarke

              The driver version is the newest version provided by the automatic Windows update.

              As for the kernel and test application, please see https://filebin.ca/3Zhov3oL768B.

              You will find the output generated by AMD and by CPU (non-AMD vendor).

              The test application produces 3 outputs:

              - debug.txt (generated variable output)

              - debug.png (rendered pixels)

              - kernel.bin (binary kernel generated by the OpenCL driver)

              In the debug.txt you will notice that column 15 (started counting with 1) is 0 at some places (starting at iteration i7) for AMD but not for CPU.

              This is the value corresponds with mx in kernel.cl. When we pass line 509 in the kernel and go back to line 428 the value of mx changes even

              though the variable was not changed by what we can find in the code. That means the content of the local variable mx (along with others)

              changed due to the jump back to the beginning of the while loop.

                • Re: [BUG] local variables overwritten in OpenCL kernel
                  dipak

                  Thanks for sharing the executable. As I ran it on a Carrizo, I didn't observe any erroneous zero values. In my case, values inside "debug.txt"(specially 15th column as you mentioned) were more similar to your debug.txt under cpu folder. At this moment, I don't know whether the issue is related to those cards or not. I'll  manage one of those cards and check it. Btw, it would be helpful if the executable could select the target device so it could be run on cpu too.

                  Regarding the driver version, please check  "AMD Radeon settings->Software" to see more details about the driver version / driver packaging version. Here is this latest one: Desktop.

                   

                  Regards,

                    • Re: [BUG] local variables overwritten in OpenCL kernel
                      dstarke

                      Thank you for testing. To rule out differences behaviors due to different rounding methods, please also try running the application with line 543-546 of the kernel removed. This will render the whole scene. You can find a reference scene at https://imagebin.ca/v/3a31XSEkxrik. There should be no real visible differences.

                      The application itself uses Boost Compute internally, therefore you can just change the device by defining the corresponding environment variables.

                      See boost/compute/system.hpp - 1.63.0.

                      The software version 17.1.1 was shown in the settings window for the driver.

                        • Re: [BUG] local variables overwritten in OpenCL kernel
                          dipak

                          Thanks. Actually, a more recent driver (17.9.1) is available here: Desktop. Please check with this and share your findings. Meanwhile, I'll try to reproduce it at my end.

                           

                          Regards

                            • Re: [BUG] local variables overwritten in OpenCL kernel
                              dstarke

                              Sorry for the late reply. Sadly, I have no means to update the drivers on the test system due to missing privileges.

                              How did it turn out on your side?

                                • Re: [BUG] local variables overwritten in OpenCL kernel
                                  dipak

                                  Running on HD7870 with 17.9.1, I observed similar zero values in debug.txt. However, I couldn't run it on cpu by setting the boost environmental variable (e.g. set BOOST_COMPUTE_DEFAULT_DEVICE_TYPE="CPU"). In this case, kernel.bin and debug.txt were always same.

                                   

                                  Regarding removing the lines 543-546 from the kernel file, I can see following code segment. Did you want to point these lines?

                                  const int cy = y - convert_int(self.height / 2);

                                  const int cx = x - convert_int(self.width / 2);

                                  float3 value = (float3)(0.0f);

                                  if (self.light.y >= 0.0f) {

                                    • Re: [BUG] local variables overwritten in OpenCL kernel
                                      dstarke

                                      Sorry, it seems that I grabbed the wrong version of the kernel.cl file on my side. I meant to remove line 539-542, the following code segment:

                                      if (x != 141 || y != 111) {
                                         out[(y * self.width) + x] = (uchar4)(0, 0, 0, 0);
                                         return;

                                          }

                                      This will make the debug.txt quite useless, but will render the whole scene in debug.png. This should make it possible to see if there are visible differences to the reference image https://imagebin.ca/v/3a31XSEkxrik. The code above selects a single pixel of the scene for deeper analysis. I can provide the reference values from debug.txt for a pixel in question, if you detect any differences.

                                      Also, for me the following invocations produced different results (even thought quite marginal):

                                      ---------------------------------------------------------------------------------------------------------------------------------

                                      set BOOST_COMPUTE_DEFAULT_DEVICE_TYPE=CPU

                                      test.exe

                                      ---------------------------------------------------------------------------------------------------------------------------------

                                      set BOOST_COMPUTE_DEFAULT_DEVICE_TYPE=GPU

                                      test.exe