Archives Discussions

heman · ‎07-31-2012

Well, it is clear that barriers are almost necessary in all the kernel for synchronization purpose. But can there be some impact of barriers on the wavefront scheduler , like trying to execute wavefronts in a workgroup more closely to each other at barriers. Just speculating

Another query was related to mem_fence function. This seems like a synchronization functionality, but it is not blocking in nature. Any situation, where this will be preferred over barriers?

binying · ‎07-31-2012

mem_fence(CLK_LOCAL_MEM_FENCE and/or

CLK_GLOBAL_MEM_FENCE):

waits until all reads/writes to local and/or global memory made by the calling work-item prior to mem_fence() are visible

to all threads in the work-group.

barrier(CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE):

waits until all work-items in the work-group have reached this point and calls mem_fence(CLK_LOCAL_MEM_FENCE and/or

CLK_GLOBAL_MEM_FENCE)

http://developer.download.nvidia.com/presentations/2009/SIGGRAPH/asia/3_OpenCL_Programming.pdf

drallan · ‎07-31-2012

But can there be some impact of barriers on the wavefront scheduler , like trying to execute wavefronts in a workgroup more closely to each other at barriers. Just speculating

Yes, barriers can improve performance, somewhat. Usually it is because the barrier guarantees the timing relationship between wavefronts in a work group. If the memory access stride is well designed, this can result in some performance improvement.

Archives Discussions

Can barriers improve performance