This is my kernel code:
__kernel void test(__global int *input_vector,__global atomic_int *mem_flag)
{
local int d[32];
if(get_local_id(0)==0) {
d[0] = 100;
}
barrier(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);
while(1) {
if(get_local_id(0) == 0) {
d[0] = 0;
break;
}
else {
if(d[0] == 0)
break;
}
}
}
Inside while loop > inside else part there is if condition that is expecting d[0] equals to 0 at some point of time. But it isn't happening. I am under the assumption that local variables can be seen by all the workitems in that workgroup. I am using APU A12-9800, ubuntu 14.04, fglrx.
Solved! Go to Solution.
However, this worked for me
__kernel void test(__global int *input_vector,__global atomic_int *mem_flag)
{
local int d[32];
if(get_local_id(0)==0) {
d[0] = 100;
}
barrier(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);
while(1) {
mem_fence(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);
if(d[0] == 0)
break;
if(get_local_id(0) == 0) {
d[0] = 0;
}
}
}
I think atomics and synchronization operations should be used in the above case. These operations play a special role in making assignments in one work-item visible to another.
Thanks.
Hi dipak,
I changed the code to this:
local atomic_int d[32];
if(get_local_id(0)==0) {
atomic_store(&d[0],100);
}
barrier(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);
while(1) {
if(get_local_id(0) == 0) {
atomic_store(&d[0], 0);
break;
}
else {
if(atomic_load(&d[0]) == 0)
break;
}
}
still same problem persists. Any other thoughts?
Please attach the complete code (host+kernel). I'll check and share my observation.
Thanks.
Hi dipak, code is attached now, however opencl kernel is updated one with the barrier code without atomic. If you want to check atomic one just replace the kernel code with above mentioned one.
However, this worked for me
__kernel void test(__global int *input_vector,__global atomic_int *mem_flag)
{
local int d[32];
if(get_local_id(0)==0) {
d[0] = 100;
}
barrier(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);
while(1) {
mem_fence(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);
if(d[0] == 0)
break;
if(get_local_id(0) == 0) {
d[0] = 0;
}
}
}
After doing some experiments with the previous kernels, here are my findings:
Hi dipak,
barrier either at the start or end of while loop if we have coded in below mentioned way where we terminates work-item at different line number i.e. one thread breaks inside if statement where as other break inside else statement and I believe this is expected as some work-items hit barrier and other don't
while(1) {
barrier(CLK_GLOBAL_MEM_FENCE| CLK_LOCAL_MEM_FENCE);
if(get_local_id(0) == 0) {
d[0] = 0;
break;
} else {
if(d[0] == 0)
break;
}
}
However, if all work-items break at same line number inside same conditional statement then only it works as mentioned in above working code (even if we replace mem_fence with barrier).
Yes, you are right about the barrier usage. If barrier is used, then all work-items in a work-group must hit the barrier, otherwise the result is undefined and not portable.
Couple of points to note though. In general, work-items do not have any execution order guarantee. Barriers provide some level of ordering guarantee for work-items within a work-group. It combines thread synchronization as well as memory fence. Whereas, memory fence itself does not provide any ordering guarantee.
Depending on hardware, an implementation may execute many work-items in parallel or one work item one at a time (say, on a single core CPU). So, if outcome of the kernel depends on execution order of the work-items, then it may loose portability and a special attention is required when a different hardware/implementation is selected.