1 of 1 people found this helpful
This is a bug in your code and not in OpenCL as you have a race condition in your writes.
The reason it works on the CPU is that each work-item is run sequentially, on the GPU they are run in parallel.
Thank you for your answer. Does OpenCL not allow to write to two components of the same vector from two threads?
How about local memory? Can I do this in a Kernel?
__local int2 localData;
localData[a].x = num0;
localData[b].y = num1;
Supposing different threads may write to the same vector?
Yes it does permit it,but what he is saying the whole problem with your first code is that it does allow permit same writes across multiple threads. I am assuming because it is a vector type, the compiler loads the entire vector from data[a] (both x and y components), modifies the .x component (but leaves the .y value unchanged) then writes both out (whether to<->from cache or not), but if another thread is modifying data[b].y which happens to be the same vector address, it reads in the original unmodified value of .x and writes out it again, cancelling out the first threads results. This can happen both ways. This is a guess of course.
I assume the problem is inconsistant across platforms not because of the serial nature of the CPU (it is atleast parallel across multiple cores right? so you could have the same problem?), but rather how the compiler handles loading/storing of vector types on different devices- on the GPU it does not attempt to address a single individual component, but deals with it writes on vector scale, and x,y&z components simply reflect register organisation. On the CPU data[a].x is translated to a single 32bit int load/store).
I say this because how else could you explain race to write only conditions (with no dependent reads) resulting in the data not being modified by any of the competing threads?
Thank you antzrhere, I agree with you!