4 Replies Latest reply on Mar 13, 2014 8:51 AM by leekiju

    barrier functions makes wrong operation.

    leekiju

      After first uploading my question, I tested more about this issue.

       

      "aftrer using barrier function the value in memory, which is qualified as __local, is changed."

       

       

      Here is my environment

      Processor : Intel i-5-2410M @2.3GHz

      version of SDK : v2.9

      and using Visual studio 2010

       

       

      I could narrow down the range.

      The problem comes from using barrier when I read and write some data in memory(array), which is qualified as __local.

      I didn't see there is some limitation  the memory area must be used as only reading or writing.

      But it seems to be operated that there is the cache and when local memory is read, that is saved in cache and flush when I use barrier, which argument is  "CLK_LOCAL_MEM_FENCE"

      The original descritption of "CLK_LOCAL_MEM_FENCE"is like below

       

      CLK_LOCAL_MEM_FENCE: The barrier function

      will either flush any variables stored in local

      memory or queue a memory fence to ensure

      correct ordering of memory operations to local

      memory.

       

      But I don't know this phonomina is from my mis-operation or potential bug.

      Is there anyone who give some solution?

       

      I attached another documents and new test kernel.

      file names are "Barrier Error2.docx"

      and "BitonicSort_Kernels.zip"

       

      and the function for testing is "TEST_BARRIER_0" in BitonicSort_Kernel.cl

       

       

       

      =================================================================================================================

      I'm testing Bitonic sort kernel using local memory. During the test, I have interesting phenomena, When I use barrier function before using async_work_group_copy.  I'm testing "TEST_BARRIER_0" kernel function in BitonicSort_Kernel.cl.

      I think barrier could be used at the position in "TEST_BARRIER_0".

       

      If I use barrier function, the local data is restored as previous data.

      I attached codes and captured image using kernel debugger.

       

      If there is anyone who has same experience or know what is my faults, Please let' me know this.

      It will be very helpful to me!

      Thanks.

        • Re: barrier functions makes wrong operation.
          amd_support

          Hi,

                    Sorry for the delay in responding. We are working on your issue with possible versions of the kernel debugger, and will get back to you very soon.

           

          With Thanks ,

          AMD Support

            • Re: barrier functions makes wrong operation.
              amd_support

              hi ,

                         Thanks for your patience . Could you give some details regrading the version of CodeXL  used in the visual studio 2010 . And you are able to see the same behaviour in the latest CodeXL release, suppose if you are not used .  

              1 of 1 people found this helpful
                • Re: barrier functions makes wrong operation.
                  leekiju

                  Hi,

                  I installed CodeXL. But I'm using only "Processor : Intel i-5-2410M @2.3GHz"  So I can see the message

                  "You don't have a supported AMD GPU. OpenCL kernel debugging will be disable!"

                   

                  I debuged it in VC2010 using the option

                  "-g -s \"C:/Users/Samsung/AMD APP SDK/2.9/samples/opencl/cl/BitonicSort/BitonicSort_Kernels.cl\""

                  in clBuildProgram

                   

                  This problem comes from bitonic sort kernel.

                  1. The Bitonic sort kernel works fine in AMD sample codes.

                  2. I modified input(and it is same with output) buffer as __local with "async_work_group_copy" function

                  3. I compared with the result of reference data, which was generated by host

                  and the result of modifed bitonic sort kernel, which used "async_work_group_copy" function.

                  4. Those 2 results are not same. After checking it, I made it to be more simply,

                  that is the "TEST_BARRIER_0" kernel. The issue of that kernel is that the "__local buffer" is used as input and output.

                  and after Barrier function that buffer seems to be flushed.

                   

                  I also tested the "TEST_BARRIER_0" kernel in Intel OpenCL SDK.

                  And the result is same with AMD OpenCL SDK, That means "wrong operation"

                   

                  I also tested the "TEST_BARRIER_0" and modified Bitonic sort kernel, which uses __local qualified buffer as an input and output,

                  On AMD GPU, Which is on my co-workers PC.

                  And the result is correct.

                   

                  I think the problem is related with my processor "Processor : Intel i-5-2410M @2.3GHz"