cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

heman
Adept II

Can barriers improve performance

Well, it is clear that barriers are almost necessary in all the kernel for synchronization purpose. But can there be some impact of barriers on the wavefront scheduler , like trying to execute wavefronts in a workgroup more closely to each other at barriers. Just speculating

Another query was related to mem_fence function. This seems like a synchronization functionality, but it is not blocking in nature. Any situation, where this will be preferred over barriers?

0 Likes
2 Replies
binying
Challenger

mem_fence(CLK_LOCAL_MEM_FENCE and/or

CLK_GLOBAL_MEM_FENCE):

waits until all reads/writes to local and/or global memory made by the calling work-item prior to mem_fence() are visible

to all threads in the work-group.

barrier(CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE):

waits until all work-items in the work-group have reached this point and calls mem_fence(CLK_LOCAL_MEM_FENCE and/or

CLK_GLOBAL_MEM_FENCE)

http://developer.download.nvidia.com/presentations/2009/SIGGRAPH/asia/3_OpenCL_Programming.pdf

0 Likes
drallan
Challenger

But can there be some impact of barriers on the wavefront scheduler , like trying to execute wavefronts in a workgroup more closely to each other at barriers. Just speculating

Yes, barriers can improve performance, somewhat. Usually it is because the barrier guarantees the timing relationship between wavefronts in a work group. If the memory access stride is well designed, this can result in some performance improvement.