I've run into a weird bug in a kernel. I reproduced the bug with a simple kernel:
//data is initialized as {0, 0, 0}
//x has size 128 and is not initialized to anything
__kernel void testBarrier(__global int *data, __local uint* x) {
x[0] = 1;
x[63] = 1;
x[64] = 1;
barrier(CLK_LOCAL_MEM_FENCE);
data[0] = x[0];
data[1] = x[63];
data[2] = x[64];
}
I then print off data on the host side and I see
1, 1, 0
If I comment out "barrier(CLK_LOCAL_MEM_FENCE);", then I see
1, 1, 1
The number of work items and work group size doesn't seem to matter. I even tried a single work item. I'm pretty sure I'm using the latest drivers and APP SDK, but I can double-check if it matters.
Any idea as to what's going on?
I don't recall having that problem myself. That said, having constant indices across different WIs ... can you post the whole thing?
I'll post more when I get home from work. I should have a chance tonight. Also, this happened with 1 or many work items, so the constant index shouldn't have been an issue.
Turns out that x was too small. CLK_LOCAL_MEM_FENCE just happened to wipe the out-of-bounds value.
Was it 128 Bytes?