Archives Discussions

spectral · ‎10-09-2012

Hi,

I have some doubts about the use of 'write_mem_fence' and 'barrier'.

Here is my code :

__local WorkItemData workItemData;
const size_t lid     = get_local_id(0);
     if (lid < 1)
     {     
          workItemData.sky                         = sky;
          workItemData.lightsCount               = lightsCount;
          workItemData.emitters                    = emitters;
     }

     //---- Sync : wait that 'workItemData' is fully initialized
     barrier(CLK_LOCAL_MEM_FENCE);
     //write_mem_fence(CLK_LOCAL_MEM_FENCE);

The goal is to insure that the variable 'workItemData is well initialized'. Can I use a write_mem_fence ? Will it be faster and more efficient ?

Is there some other way to improve performance in this case, by example if I put more operation between the initialization and use, sometimes I don't need to synchronize (But it is dangerous) ! Maybe there are some other ways.

Thanks

LeeHowes · ‎10-09-2012

As you point out, it's a synchronization issue, therefore it needs a synchronization operation. In OpenCL the only synchronization operations are barriers. The fence would only guarantee that the compiler and hardware won't reorder writes across the fence, it won't ensure that work items wait for each other.

Given that the hardware runs 64 work items simultaneously, if you only have 64 work items in your group you could in theory drop the barrier. However, if you only have 64 work items in your group the compiler will do that for you anyway so you might as well put it in in practice. If you have more than 64 work items then you need the barrier to ensure that the other wavefront waits for the one that did the writes.

View solution in original post

binying · ‎10-09-2012

What kind of data are they, sky, emitters and lightsCount?This post would be helpful, though.

http://devgurus.amd.com/message/1242922#1242922.

binying · ‎10-09-2012

Will it be faster and more efficient ?--I would compare their assemblers through Kernel Analyzer.

spectral · ‎10-09-2012

Theses are pointers to buffers. So my goal is to insure they are available in all the work-items.

Not sure that analyzing the binaries will help ! It is a question about synchronization 😛 not instructions.

LeeHowes · ‎10-09-2012

As you point out, it's a synchronization issue, therefore it needs a synchronization operation. In OpenCL the only synchronization operations are barriers. The fence would only guarantee that the compiler and hardware won't reorder writes across the fence, it won't ensure that work items wait for each other.

Given that the hardware runs 64 work items simultaneously, if you only have 64 work items in your group you could in theory drop the barrier. However, if you only have 64 work items in your group the compiler will do that for you anyway so you might as well put it in in practice. If you have more than 64 work items then you need the barrier to ensure that the other wavefront waits for the one that did the writes.

spectral · ‎10-10-2012

Thanks a lot Lee, for your complete explanation.

Archives Discussions

barrier or write_mem_fence