cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

t-man
Adept II

Different values in threads of the same global variable

Ao I am dealing with the following situation. I want all my threads to do a loop until none of them reaches a certain "if", so I use a global variable that stores 0 and when the if statement is reached that variable becomes 1. 

The problem is that at the end of the loop I do a barrier to synchronize all the threads( in my case I just have 4) and only does that reached the if have the value of the global variable 1, the others have it 0.

How can that be? How can a global variable have different values in different threads?

0 Likes
3 Replies
LeeHowes
Staff

Because that is the way memory consistency works. OpenCL C, and indeed C itself, do not require that an arbitrary write to memory is visible until certain guarantees are met. In the case of C11 and C++11 these are the atomics, although marking a variable as volatile is usually interpreted that way too (with some caveats). What that means is that if you think you are writing to a memory address because the compiler has no requirement to actually make that visible, and because making memory visible to other work items or (worse) other threads is high overhead, it is at liberty to optimize the write away and keep the data in a register until the point at which it must become visible.

The compiler can be quite aggressive in doing this. You could try replacing your reads and writes with atomic operations. You could use an atomic compare and set in a loop:

while(atomic_cmpxchg(&location, 0, 1)!=1){...}

That will force it to be accessing memory each time.

I may be misinterpreting your post, though, given that you talk about barriers and barriers should enforce visibility. Maybe you could show us the code?

Yes that's the exact answer I was looking for. Thank you for your fast response . I was afraid that a thread might keep the value in a register somewhere, but now I am going to replace everything with atomic operations. Hope it will work!

0 Likes

"{                                                                  \n" \
"   int i = get_local_id(0);                                      \n" \
"int glob = get_global_id(0); \n" \
"   int j = get_num_groups(0);\n" \
" __local int k; k = get_group_id(0); \n" \
"  __local int size; size = get_local_size(0); \n"\
"   atomic_xchg(sem,0); \n" \

   " if(i==0) { \n" \
   atomic_xchg(found,1);}\n" \
   "barrier(CLK_GLOBAL_MEM_FENCE); \n" \
   "BC[glob] = atomic_xchg(found,1); \n" \

For this example, I launched a kernel with 2 workgroups and 2 threads per workgroup, but only thread 0 of each group has found  = 1 in BC. Any reasoning? found is of type __global int* and BC of type __global double* . The allocated memory to found is sizeof(int) and BC is an array of doubles.

0 Likes