The following kernel when run on a CPU device causes the error_buffer to return non-zero values:
#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable
__kernel void test(__global int* error_buffer) { __local int fire_table_idx; int local_id = get_local_id(0); if(local_id == 0) fire_table_idx = 0; barrier(CLK_LOCAL_MEM_FENCE); if(fire_table_idx != 0) error_buffer[0] = fire_table_idx; else error_buffer[0] = 0; atom_inc(&fire_table_idx); }
If I remove the call to atom_inc at the bottom, the kernel functions fine (i.e. fire_table_idx is set to 0).
Any ideas as to why this might be happening? Am I not using __local variables correctly?
this is maybe related to non array local variable bug. as workaround change it to
fire_table_idx to array like
__local int fire_table_idx[1];
fire_table_idx[0] = 0; //etc
SiegeLord,
the extension name is #pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable.
I am not able to understand what do you want to do with this kernel.You are changing the value of same variable error_buffer[0] inside every work group.
Originally posted by: nou this is maybe related to non array local variable bug. as workaround change it to
fire_table_idx to array like __local int fire_table_idx[1]; fire_table_idx[0] = 0; //etc
That did not help, unfortunately.
Originally posted by: himanshu.gautam
the extension name is #pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable.
No. That does not enable 32 bit local atomics, which is what I use. With your extension, there are link time errors when compiling the kernel.
Originally posted by: himanshu.gautam
I am not able to understand what do you want to do with this kernel.You are changing the value of same variable error_buffer[0] inside every work group.
This kernel is just to highlight the bug. The bug does not go away if you assign to different entries of the error_buffer.
I have managed to come up with something that I think works, here it is:
__kernel void test(__global int* error_buffer)
{
int local_id = get_local_id(0);
int global_id = get_global_id(0);
__local int fire_table_idx;
if(local_id == 0)
fire_table_idx = 0;
barrier(CLK_LOCAL_MEM_FENCE);
error_buffer[global_id] = fire_table_idx;
atom_inc(&fire_table_idx);
fire_table_idx = 0; // <- set it explicitly to 0 BEFORE any more calls to barrier
barrier(CLK_LOCAL_MEM_FENCE); // <- has to be after the previous line
}
Setting it explicitly to 0 worked. In my real function there is also a barrier after initialization... I found that the barrier has to go after you reset the fire_table_idx to 0, or this fix won't work.
Still... this fix seems a tad arbitrary...
Actually... no. While this fix works for this simple kernel, it doesn't work for the real kernel I use. Anyone have any other suggestions?
Incidentally... while this specific bug appears only on the CPU device, the GPU also gets errors from this local atomic/barrier combination, just not in this simple function.
Please send a test case to streamdeveloper@amd.com
siegelord,
I checked out the kernel you provided.
I do not find any stange behaviour about it when i run it on CPU.
Can you explain your problem?
I think I solved it (thanks to jcornwall on #opencl). What I needed was another barrier in that function to prevent one thread incrementing the local variable before some other thread reaches the check for it being 0. Here's the fixed kernel:
#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable
__kernel void test(__global int* error_buffer)
{
__local int fire_table_idx;
int local_id = get_local_id(0);
if(local_id == 0)
fire_table_idx = 0;
barrier(CLK_LOCAL_MEM_FENCE);
if(fire_table_idx != 0)
error_buffer[0] = fire_table_idx;
else
error_buffer[0] = 0;
barrier(CLK_LOCAL_MEM_FENCE);
atom_inc(&fire_table_idx);
}
I think that makes my code run without bugs for now... but I'll reply back if I get a bug related to this again.