cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

alariq
Adept I

memory aliasing and use of mem_fence

Hello, All

I have a bit of a specific question related to how compiler may rearrange my OpenCL source code and side effect of this.

In my kernel, I am allocating some local memory and define 2 pointers to point to the same memory. Just for convenience.

e.g.

__local char blob[1024];

__local int* ptrA = (__local int*)blob;

__local int* ptrB = (__local int*)blob;

then i have a code like this:

int li = get_local_id(0);

//..

// calculations which fill memory pointed by ptrA

//...

int result = ptrA[map_id(li)]; // each local work item gets its bit of info map_id() does some remapping of li to actual index in ptrA

// /+/ mem_fence(LCK_LOCAL_MEM_FENCE);

// (*) some code (in my case output of debug info to __global buf)

if(do_full_calc)

{

// calculations which fill memory poited by ptrB

// NOTE: result is already read, so we can rewrite blob memory here

} else if(result) // result is used only here

{

// do some lightweight calculations

}

// exit

This piece of code works differently in two cases:

1) If (*) line present, /+/ not present  - everything is fine. Like it should be.

2) If i remove (*), /+/ is still commented - some kernel output is broken. But kernel output is ok if i'll uncomment /+/ line

I understand that (*) has no influence on the algorithm, and adding/removing it only may change how compiler will rearrange my source code.

So in case 2) i assume that actual read result = ptrA[li] is put after do_full_calc branch is executed (and ptrA is owerwritten) , because i actually never use result before the "if".

From the other side if i add (*) compiler decides (i assume, i have not looked at the asm code, my kernel is quite a big) to leave "result = ptrA[li]" where it was and so kernel output is correct.

mem_fence seems to cure the situation. But i still want to be sure that this is a correct solution (not just a coincidence).

And if it is correct, then I assume that OpenCL compiler schedules instruction without preserving read/write order (in case of local mem).

Also another question: is there any real effect of "restrict" keyword? Does OpenCL compiler makes conservative decisions, assuming that two memory pointers may always alias same memory (if they are point to the same type, and so on.. like it is for C99).

Hope it was clear.

Thanks.

ask if this
Détecter la langue » English
Détecter la langue » English
0 Likes
1 Solution

I really admired your clear explanation and I still managed to get it all wrong .

I see that the problem occurs when the extra code (*) is not executed, but the idea is similar. However, I described what happens when multiple waves in a group get out of sync and a barrier instruction is used to bring the waves together. Your question was about mem_fence, not barriers, which are  different. Mem_fence is a compiler directive to ensure all previous memory operations have been committed, which is usually done at compile time, not execution time.

Sorry for the confusion. But the answer for mem_fence is similar. If the compiler reordered the reads and writes, it could cause corruption that can be fixed by using mem_fence to prevent reordering. Some other block of code inserted at the same place may do the same. A barrier using the CLK_LOCAL_MEM_FENCE flag will also prevent the reordering, even if the barrier is not useful for a single wave.

View solution in original post

0 Likes
11 Replies
binying
Challenger

"mem_fence seems to cure the situation."---Do you mean with mem_fence, your code works fine? I should say it is safer with mem_fence with the information you give.

0 Likes

yes with mem_fence it works fine.  Actually read_mem_fence is enough here.

P.S. However, i surprised that program works even with write_meme_fence() (Just put this out of curiosity)

Détecter la langue » English
0 Likes
drallan
Challenger

Yes, this is a correct use of mem_fence, which is preventing the data corruption.

The reason the corruption only occurs when you export debug information is because writing debug information can cause rather large and unpredictable delays for different waves. This means some waves can write using pointer B before others have read pointer A. The is particularly true of  things like the printf() function. When you don't write debug information, the program works in part due to luck. The mem_fence is recommended.

0 Likes

Hi, drallan.

I understand your point. The only thing is that my program works fine if output to debug buffer is present. So, maybe this wait actually makes my program to work correctly for some reason.

  I, probably, described option 2) in a bit confusing manner, by saying "remove (*)" i meant completely removing debug output, not uncommenting it).

cures
Détecter la langue » English
0 Likes

Also as far as i understand different waves can never write to same areas of local memory (exception is the case when work group size is > 64  and thus work group actually has several waves). In my case work group size is 64 and whole group consists of one wavefront.

only
Détecter la langue » English
0 Likes

I really admired your clear explanation and I still managed to get it all wrong .

I see that the problem occurs when the extra code (*) is not executed, but the idea is similar. However, I described what happens when multiple waves in a group get out of sync and a barrier instruction is used to bring the waves together. Your question was about mem_fence, not barriers, which are  different. Mem_fence is a compiler directive to ensure all previous memory operations have been committed, which is usually done at compile time, not execution time.

Sorry for the confusion. But the answer for mem_fence is similar. If the compiler reordered the reads and writes, it could cause corruption that can be fixed by using mem_fence to prevent reordering. Some other block of code inserted at the same place may do the same. A barrier using the CLK_LOCAL_MEM_FENCE flag will also prevent the reordering, even if the barrier is not useful for a single wave.

0 Likes

Yeah, thanks I totally agree with your description.

vuyskshzeshschtyu
Détecter la langue » English
0 Likes

The following violates ansi C aliasing so that may explain why mem_fence "fixes" it because it may have prevented certain memory moves...

__local char blob[1024];

__local int* ptrA = (__local int*)blob;

__local int* ptrB = (__local int*)blob;

I'm guessing that "debug info" has a use of "result"; therefore, it forces a read into "result" before the local memory gets clobbered by the operations in "if(do_full_calc)".

Without the fence and the debug stuffs, it possible that the compiler could differ the load into "result" when it reaches "else if(result)"

0 Likes

Hello, siu

I agree with you that: "it possible that the compiler could differ the load into "result" when it reaches "else if(result)""

but I also must say that "debug info" does not uses result in any way (if fact it is completely independent of "result")

And also i can't understand why is the above code violates C99 aliasing rules. AFAIK "char*"  can alias anything.

And regarding pA & pB they are pointers to the same type, so can alias each other is n't it?

Thanks.

0 Likes

I think you can cast any pointer to char* but not the other way around.

I agree with you that pA and pB are aliased.  Without the full source code, it's hard to determine what exactly is happening...

0 Likes

Anyway, thanks for help!

Détecter la langue » English
0 Likes