4 Replies Latest reply on Aug 17, 2016 1:08 AM by joej

    OpenCL Driver Bug FuryX 32bit


      I get wrong results with a simple prefix sum with 32 bit version on Fiji, 64 bit works, Tahiti works with both.


      I created a minimal project to reproduce:

      GitHub - JoeJGit/OpenCL_Fiji_Bug_Report: Expose a 32bit driver bug


      Project is in the zip file (don't know how to upload a complete directory).

      Please take a look, i have a similar problem with Vulkan but was not yet able to reproduce it in a small test. See bug report.txt for details.

        • Re: OpenCL Driver Bug FuryX 32bit

          Hi Joe,

          Thanks for reporting the issue.

          It seems a compiler optimization issue. I can see the same error even for x64 build on my Hawaii card. After some experiments, I got below workarounds:

          1) Disable the optimization during kernel build i.e. pass optimization flag "-O0" or "-cl-opt-disable"


          2) In PrefixSum() kernel (prefix_sum.cl), declare the following variables inside the loop as "volatile":

          for (uint step = 0; step < 8; step++)


          uint mask = ..

          uint rd_id = ...

          uint wr_id = ...




          Could you please try the above workarounds and share your observation?



            • Re: OpenCL Driver Bug FuryX 32bit

              Hi Dipak,yes both suggested workarounds work for me.


              Disabling optimization also solved another issue just showed up in x64 for me too.

              That's a nice stress test because it's a graphics app processing 60000 workgroups per frame, so i can see it keeps stable with big workloads over time.

              Workgroups are either 64, 128 or 256 threads wide, and i need to disable optimizer only for 128 & 256 groups.

              Let me know if you want me to track down the origin of this different bug, maybe i can create a second test case.


              Do you think the same compiler issue can explain similar bugs in Vulkan?

              In Vulkan the behaviour is very different: No bugs show up for about 10 - 30 frames, then they start popping up with increasing frequency.

              And if i remember correctly, bugs also happen with workgroup size of 64, so it's not necessarily just a wavefront sync issue.




              The Vulkan bug has magically disappeared. I did not change the shader and can't remember any relevant changes in the project - no clue why it's gone.

              My guess is that another shader executed before and replaced in the meantime may have caused some kind of corruption - or something completely different...

                • Re: OpenCL Driver Bug FuryX 32bit

                  Thank you Joe for the confirmation. I'll open a ticket for that optimization issue.


                  Let me know if you want me to track down the origin of this different bug, maybe i can create a second test case.

                  Sure, you can share the test-case. I would encourage you to create a new thread for the second one if it's a different bug. It would help us to track in future.



                    • Re: OpenCL Driver Bug FuryX 32bit

                      It turned out my second OpenCL bug was my own fault.


                      One more issue, maybe not worth it's own thread:

                      I often use half floats to compress data in LDS.

                      Mostly this gives me a speedup close to 2 as it helps to increase occupancy.

                      But on complex shaders VGPR usage increases by large numbers (according to CodeXL), and copression causes slow down.

                      I looks like the compiler does not free temporary registers used for the conversation by convert_float4.


                      The same code in Vulkan still shows improvement using compression, Vulkan shader is 4 times faster than OpenCL kernel.

                      (Vulkan is generally faster, but usually about 10-20%)