cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

spectral
Adept II

barrier or write_mem_fence

Hi,

I have some doubts about the use of 'write_mem_fence' and 'barrier'.

Here is my code :

__local WorkItemData workItemData;
const size_t lid     = get_local_id(0);
     if (lid < 1)
     {     
          workItemData.sky                         = sky;
          workItemData.lightsCount               = lightsCount;
          workItemData.emitters                    = emitters;
     }

     //---- Sync : wait that 'workItemData' is fully initialized
     barrier(CLK_LOCAL_MEM_FENCE);
     //write_mem_fence(CLK_LOCAL_MEM_FENCE);

The goal is to insure that the variable 'workItemData is well initialized'. Can I use a write_mem_fence ? Will it be faster and more efficient ?

Is there some other way to improve performance in this case, by example if I put more operation between the initialization and use, sometimes I don't need to synchronize (But it is dangerous) ! Maybe there are some other ways.

Thanks

0 Likes
1 Solution

As you point out, it's a synchronization issue, therefore it needs a synchronization operation. In OpenCL the only synchronization operations are barriers. The fence would only guarantee that the compiler and hardware won't reorder writes across the fence, it won't ensure that work items wait for each other.

Given that the hardware runs 64 work items simultaneously, if you only have 64 work items in your group you could in theory drop the barrier. However, if you only have 64 work items in your group the compiler will do that for you anyway so you might as well put it in in practice. If you have more than 64 work items then you need the barrier to ensure that the other wavefront waits for the one that did the writes.

View solution in original post

5 Replies
binying
Challenger

What kind of data are they, sky, emitters and lightsCount?This post would be helpful, though.

http://devgurus.amd.com/message/1242922#1242922.

0 Likes

Will it be faster and more efficient ?--I would compare their assemblers through Kernel Analyzer.

0 Likes

Theses are pointers to buffers. So my goal is to insure they are available in all the work-items.

Not sure that analyzing the binaries will help ! It is a question about synchronization 😛 not instructions.

0 Likes

As you point out, it's a synchronization issue, therefore it needs a synchronization operation. In OpenCL the only synchronization operations are barriers. The fence would only guarantee that the compiler and hardware won't reorder writes across the fence, it won't ensure that work items wait for each other.

Given that the hardware runs 64 work items simultaneously, if you only have 64 work items in your group you could in theory drop the barrier. However, if you only have 64 work items in your group the compiler will do that for you anyway so you might as well put it in in practice. If you have more than 64 work items then you need the barrier to ensure that the other wavefront waits for the one that did the writes.

Thanks a lot Lee, for your complete explanation.

0 Likes