Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

barrier functions makes wrong operation.

After first uploading my question, I tested more about this issue.

"aftrer using barrier function the value in memory, which is qualified as __local, is changed."

Here is my environment

Processor : Intel i-5-2410M @2.3GHz

version of SDK : v2.9

and using Visual studio 2010

I could narrow down the range.

The problem comes from using barrier when I read and write some data in memory(array), which is qualified as __local.

I didn't see there is some limitation  the memory area must be used as only reading or writing.

But it seems to be operated that there is the cache and when local memory is read, that is saved in cache and flush when I use barrier, which argument is  "CLK_LOCAL_MEM_FENCE"

The original descritption of "CLK_LOCAL_MEM_FENCE"is like below

CLK_LOCAL_MEM_FENCE: The barrier function

will either flush any variables stored in local

memory or queue a memory fence to ensure

correct ordering of memory operations to local


But I don't know this phonomina is from my mis-operation or potential bug.

Is there anyone who give some solution?

I attached another documents and new test kernel.

file names are "Barrier Error2.docx"

and ""

and the function for testing is "TEST_BARRIER_0" in


I'm testing Bitonic sort kernel using local memory. During the test, I have interesting phenomena, When I use barrier function before using async_work_group_copy.  I'm testing "TEST_BARRIER_0" kernel function in

I think barrier could be used at the position in "TEST_BARRIER_0".

If I use barrier function, the local data is restored as previous data.

I attached codes and captured image using kernel debugger.

If there is anyone who has same experience or know what is my faults, Please let' me know this.

It will be very helpful to me!


3 Replies


          Sorry for the delay in responding. We are working on your issue with possible versions of the kernel debugger, and will get back to you very soon.

With Thanks ,

AMD Support


hi ,

           Thanks for your patience . Could you give some details regrading the version of CodeXL  used in the visual studio 2010 . And you are able to see the same behaviour in the latest CodeXL release, suppose if you are not used .  


I installed CodeXL. But I'm using only "Processor : Intel i-5-2410M @2.3GHz"  So I can see the message

"You don't have a supported AMD GPU. OpenCL kernel debugging will be disable!"

I debuged it in VC2010 using the option

"-g -s \"C:/Users/Samsung/AMD APP SDK/2.9/samples/opencl/cl/BitonicSort/\""

in clBuildProgram

This problem comes from bitonic sort kernel.

1. The Bitonic sort kernel works fine in AMD sample codes.

2. I modified input(and it is same with output) buffer as __local with "async_work_group_copy" function

3. I compared with the result of reference data, which was generated by host

and the result of modifed bitonic sort kernel, which used "async_work_group_copy" function.

4. Those 2 results are not same. After checking it, I made it to be more simply,

that is the "TEST_BARRIER_0" kernel. The issue of that kernel is that the "__local buffer" is used as input and output.

and after Barrier function that buffer seems to be flushed.

I also tested the "TEST_BARRIER_0" kernel in Intel OpenCL SDK.

And the result is same with AMD OpenCL SDK, That means "wrong operation"

I also tested the "TEST_BARRIER_0" and modified Bitonic sort kernel, which uses __local qualified buffer as an input and output,

On AMD GPU, Which is on my co-workers PC.

And the result is correct.

I think the problem is related with my processor "Processor : Intel i-5-2410M @2.3GHz"