So I've read the Khronos docs over and over again and I still can't figure out when it is the correct situation to use a mem_fence. The explanation is somewhat imprecise: I understand that it tells the compiler not to reorder memory loads and stores across the fence, BUT:
1. Which loads and stores should be a concern? For example, it seems reasonable that the following could be problematic (Listing 1 - it's my code). I am loading from a global, modifying, then storing back. But, is it truly expected that the compiler could reorder these loads and stores so they could happen in the opposite order?
2. In the same listing, is it only the original global that is of concern (if at all), or is the second global a problem too. If so, then it seems like all of our code should be littered with mem_fences, since every kernel pretty much boils down to load from global, do stuff, store to global.
3. Aside from that, it is clear to me what mem_fence does. But what about its little cousins read_mem_fence and write_mem_fence? What exactly are they preventing the compiler from doing? Is read_mem_fence making sure that all reads happen in the order declared? What is an example of this being a problem? Ditto for the write_ function.
I'd welcome an explanation from somebody who understands them better than I.
I would love to see example code that has a problem that is solved with any of the mem_fence functions.
Thanks!
Listing 1: float4 body = v_bodies[idx]; body.w = 0.0f; ... v_prev_bodies[idx] = body; // save a copy in the vector of previous positions v_bodies[idx] = body; // zero the .w component for correct distance calculation
Thanks, now it makes sense for the read_ and write_ functions.
For the regular mem_fence, it seems that the only way to get the into a situation the requires it, is if you use some kind of casting or pointer magic to access memory. Otherwise the compiler should be able to track the dependencies. Is that correct?
If you had an example of some bad code, it would certainly help illustrate this...
Thanks!