cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

avinashkrc
Adept I

Getting stuck in a loop, does local variable not visible to other workitems in a work group?

This is my kernel code:

__kernel void test(__global int *input_vector,__global atomic_int *mem_flag)
{
    local int d[32];
    if(get_local_id(0)==0) {
            d[0] = 100;
        }
    barrier(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);
    while(1) {
        if(get_local_id(0) == 0) {
            d[0] = 0;
            break;
        }
        else {
            if(d[0] == 0)
            break;
        }
    }

}

Inside while loop > inside else part there is if condition that is expecting d[0] equals to 0 at some point of time. But it isn't happening. I am under the assumption that local variables can be seen by all the workitems in that workgroup. I am using APU A12-9800, ubuntu 14.04, fglrx.

0 Likes
1 Solution

However, this worked for me

__kernel void test(__global int *input_vector,__global atomic_int *mem_flag)
{
    local int d[32];                                                  
    if(get_local_id(0)==0) {                                          
            d[0] = 100;
        }   
    barrier(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);               
    while(1) {
            mem_fence(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);     
           if(d[0] == 0)
            break;
        if(get_local_id(0) == 0) {                                    
            d[0] = 0;   
        }   
    }     
}

View solution in original post

0 Likes
8 Replies
dipak
Big Boss

I think atomics and synchronization operations should be used in the above case. These operations play a special role in making assignments in one work-item visible to another.

Thanks.

0 Likes

Hi dipak,

I changed the code to this:

local atomic_int d[32];
    if(get_local_id(0)==0) {
            atomic_store(&d[0],100);
        }
    barrier(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);
    while(1) {
        if(get_local_id(0) == 0) {
            atomic_store(&d[0], 0);
            break;
        }
        else {
           if(atomic_load(&d[0]) == 0)
            break;
        }
    }

still same problem persists. Any other thoughts?

0 Likes

Please attach the complete code (host+kernel). I'll check and share my observation.

Thanks.

0 Likes

Hi dipak, code is attached now, however opencl kernel is updated one with the barrier code without atomic. If you want to check atomic one just replace the kernel code with above mentioned one.

0 Likes

However, this worked for me

__kernel void test(__global int *input_vector,__global atomic_int *mem_flag)
{
    local int d[32];                                                  
    if(get_local_id(0)==0) {                                          
            d[0] = 100;
        }   
    barrier(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);               
    while(1) {
            mem_fence(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);     
           if(d[0] == 0)
            break;
        if(get_local_id(0) == 0) {                                    
            d[0] = 0;   
        }   
    }     
}

0 Likes

After doing some experiments with the previous kernels, here are my findings:

  • outcome of the program depends on which path inside the while-loop is chosen first by the compiler. when "if" branch is executed first, then d[0] is already set to 0 before testing it in the  "else" branch. But if the "else" branch  (i.e. get_local_id(0) > 0) is executed first, then it generates an infinite loop waiting for d[0] to be 0 and causes the hang. 
  •  a barrier at the end of while loop can force  both the branches to merge after each iteration, so infinite "else" loop should not happen.
  • As I observed, a fence instruction also worked here because  "if" branch was chosen first then.
0 Likes

Hi dipak,

barrier either at the start or end of while  loop if we have coded in below mentioned way where we terminates work-item at different line number i.e. one thread breaks inside if statement where as other break inside else statement and I believe this is expected as some work-items hit barrier and other don't

while(1) {
    barrier(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);
        if(get_local_id(0) == 0) {
            d[0] = 0;
            break;
        } else {
           if(d[0] == 0)
            break;
        }
    }

However, if all work-items break at same line number inside same conditional statement then only it works as mentioned in above working code (even if we replace mem_fence with barrier).

0 Likes

Yes, you are right about the barrier usage. If barrier is used, then all work-items in a work-group must hit the barrier, otherwise the result is undefined and not portable. 

Couple of points to note though. In general, work-items do not have any execution order guarantee. Barriers provide some level of ordering guarantee for work-items within a work-group. It combines thread synchronization as well as memory fence. Whereas, memory fence itself does not provide any ordering guarantee.

Depending on hardware, an implementation may execute many work-items in parallel or one work item one at a time (say, on a single core CPU). So, if outcome of the kernel depends on execution order of the work-items, then it may loose portability and a special attention is required when a different hardware/implementation is selected.