I have a bit of a specific question related to how compiler may rearrange my OpenCL source code and side effect of this.
In my kernel, I am allocating some local memory and define 2 pointers to point to the same memory. Just for convenience.
__local char blob;
__local int* ptrA = (__local int*)blob;
__local int* ptrB = (__local int*)blob;
then i have a code like this:
int li = get_local_id(0);
// calculations which fill memory pointed by ptrA
int result = ptrA[map_id(li)]; // each local work item gets its bit of info map_id() does some remapping of li to actual index in ptrA
// /+/ mem_fence(LCK_LOCAL_MEM_FENCE);
// (*) some code (in my case output of debug info to __global buf)
// calculations which fill memory poited by ptrB
// NOTE: result is already read, so we can rewrite blob memory here
} else if(result) // result is used only here
// do some lightweight calculations
This piece of code works differently in two cases:
1) If (*) line present, /+/ not present - everything is fine. Like it should be.
2) If i remove (*), /+/ is still commented - some kernel output is broken. But kernel output is ok if i'll uncomment /+/ line
I understand that (*) has no influence on the algorithm, and adding/removing it only may change how compiler will rearrange my source code.
So in case 2) i assume that actual read result = ptrA[li] is put after do_full_calc branch is executed (and ptrA is owerwritten) , because i actually never use result before the "if".
From the other side if i add (*) compiler decides (i assume, i have not looked at the asm code, my kernel is quite a big) to leave "result = ptrA[li]" where it was and so kernel output is correct.
mem_fence seems to cure the situation. But i still want to be sure that this is a correct solution (not just a coincidence).
And if it is correct, then I assume that OpenCL compiler schedules instruction without preserving read/write order (in case of local mem).
Also another question: is there any real effect of "restrict" keyword? Does OpenCL compiler makes conservative decisions, assuming that two memory pointers may always alias same memory (if they are point to the same type, and so on.. like it is for C99).
Hope it was clear.