Archives Discussions

Gunter · ‎08-22-2011

Hi,

I have a question concerning LDS operation.

If I use lds_store_vec_id(0) mem.xyzw ... in my kernel, and two or more (or all) threads write to the same address in LDS, what will be in the LDS locations in question afterwards?

Is it: xyzw from that particular thread that happened to write last,

or possibly x from one thread, y from another, z from the third etc. ?

Thx

LeeHowes · ‎08-22-2011

Vector writes are almost certainly not atomic so I would say it's completely unpredictable what you'd get. I'm pretty sure 32-bit writes are atomic so you should get valid words just not necessarily from the same work item.

Gunter · ‎08-22-2011

Thanks for your answer.

However, I wasn't referring to atomic (read-modify-write) instructions. I just wanted to know the following: if simultaneous writes to the same address occur, some instance has to determine the order these writes are executed. If vectors are written, are the four components always scheduled at the same time, or can individual components be written at different times? Or more importantly, is the last write always from only one vector?

Thx

LeeHowes · ‎08-22-2011

I know you weren't. As I said, the vector write is probably not atomic. That is the write will be issued as four different scalar writes in the memory system which might be independently reordered. Hence you cannot guarantee that the entire vector will be from a single work item. Though, within a hardware thread (wavefront) there might be an ordering guarantee (with increasing work item ID, say) but that's certainly not guaranteed by the OpenCL spec.

Gunter · ‎08-23-2011

Thanks everybody, I have now a much clearer picture.

Final question: if I use the scalar version lds_store_id(), and two or more threads write to the same DWORD in LDS, what will be in that DWORD afterwards: any one of the values written by the threads, the previous value, or will it be undefined?

Thx

malcolm3141 · ‎09-05-2011

The current behaviour that I have found (on nVidia and ATI interestingly) is to commit in thread ID order. That is threads with higher ID will win.

It should be noted strongly here that this behaviour is not guaranteed to be true, and specifically may be totally different on future hardware. If you want to rely on this, test it first. Also, without using atomic loads / stores, or the appropriate fences, the compiler may make transformations that break ordering as well (since it isn't guaranteed by any specification).

jeff_golds · ‎08-22-2011

Originally posted by: Gunter Thanks for your answer.

However, I wasn't referring to atomic (read-modify-write) instructions. I just wanted to know the following: if simultaneous writes to the same address occur, some instance has to determine the order these writes are executed. If vectors are written, are the four components always scheduled at the same time, or can individual components be written at different times? Or more importantly, is the last write always from only one vector?

You can't rely on any order whatsoever unless you use barriers. Its your job to make sure threads don't overwrite each other's data.

Jeff