6 Replies Latest reply on Sep 5, 2011 12:22 AM by malcolm3141

    LDS operation

    Gunter

      Hi,

      I have a question concerning LDS operation.

      If I use lds_store_vec_id(0) mem.xyzw ... in my kernel, and two or more (or all) threads write to the same address in LDS, what will be in the LDS locations in question afterwards?

      Is it: xyzw from that particular thread that happened to write last,

      or possibly x from one thread, y from another, z from the third etc. ?

      Thx

       

        • LDS operation
          LeeHowes

          Vector writes are almost certainly not atomic so I would say it's completely unpredictable what you'd get. I'm pretty sure 32-bit writes are atomic so you should get valid words just not necessarily from the same work item.

          • LDS operation
            Gunter

            Thanks for your answer.

            However, I wasn't referring to atomic (read-modify-write) instructions. I just wanted to know the following: if simultaneous writes to the same address occur, some instance has to determine the order these writes are executed. If vectors are written, are the four components always scheduled at the same time, or can individual components be written at different times? Or more importantly, is the last write always from only one vector?

            Thx

             

              • LDS operation
                LeeHowes

                I know you weren't. As I said, the vector write is probably not atomic. That is the write will be issued as four different scalar writes in the memory system which might be independently reordered. Hence you cannot guarantee that the entire vector will be from a single work item. Though, within a hardware thread (wavefront) there might be an ordering guarantee (with increasing work item ID, say) but that's certainly not guaranteed by the OpenCL spec.

                  • LDS operation
                    Gunter

                    Thanks everybody, I have now a much clearer picture.

                    Final question: if I use the scalar version lds_store_id(), and two or more threads write to the same DWORD in LDS, what will be in that DWORD afterwards: any one of the values written by the threads, the previous value, or will it be undefined?

                    Thx

                      • LDS operation
                        malcolm3141

                        The current behaviour that I have found (on nVidia and ATI interestingly) is to commit in thread ID order. That is threads with higher ID will win.

                        It should be noted strongly here that this behaviour is not guaranteed to be true, and specifically may be totally different on future hardware. If you want to rely on this, test it first. Also, without using atomic loads / stores, or the appropriate fences, the compiler may make transformations that break ordering as well (since it isn't guaranteed by any specification).

                    • LDS operation
                      jeff_golds

                       

                      Originally posted by: Gunter Thanks for your answer.

                       

                      However, I wasn't referring to atomic (read-modify-write) instructions. I just wanted to know the following: if simultaneous writes to the same address occur, some instance has to determine the order these writes are executed. If vectors are written, are the four components always scheduled at the same time, or can individual components be written at different times? Or more importantly, is the last write always from only one vector?

                      You can't rely on any order whatsoever unless you use barriers.  Its your job to make sure threads don't overwrite each other's data.

                      Jeff