10 Replies Latest reply on Jul 10, 2010 6:58 PM by LeeHowes

    Racing in memory writ

    rotor
      memory access

      Hi folks,

      I would like to bring the racing situation happened in memory access in OpenCL here to discuss. As ATI/Nvidia tech docs specify that when it has multiple work items try to access the same memory space, the memory controller will queue them. However based on my experience the access queue only help in read but not in write. In my OpenCL apps, when multiple work items try to write (update) the content of the same memory space the racing condition happened and cause to wrong results. Currently I am running on 48xx series.

      So has anyone faced the same problem? and is there and solution to this problem in general and/or is there any improvement in memory controller architecture in 58xx and fermi series to avoid the problem?

      Thanks,

      Roto

        • Racing in memory writ
          MicahVillmow
          Rotor,
          This is not just an issue unique OpenCL, this is an issue anytime you have multiple threads/work-items/processes accessing the same location in memory. The only solution is to force serialization on the memory being accessed by multi work-items. On the 5XXX series of cards you can use atomic operations to serialize access, on the 4XXX series of cards these atomics don't exist. Unless you have some method of serializing writes, which there is none support on the 4XXX series, the results will never be deterministic. It is not the job of the memory controller to guarantee determinism in the case of a race condition at the kernel level.
          • Racing in memory writ
            LeeHowes

            I'm not sure I understand the question. What kind of race condition are you getting and how would you imagine it be fixed? Even if, as Micah suggests, you serialize writes then you still have no guarantee that reads from another work group will be serialized with them the way you expect so you'll always expect unpredictable read/write ordering. At best you could guarantee the order within the reads and writes for a given wavefront - and in OpenCL you can't completely because of the memory consistency model. You can insert fences to give some level of control, though.

            • Racing in memory writ
              MicahVillmow
              Lee,
              The problem is that Rotor is having multiple work-items update the same location in global memory. Without some serialization mechanism(i.e. atomic ops), there will be a race condition on the update of the memory. Fences give a way to synchronize between reads/writes, but the barrier won't serialize the update operation, atomic's will.
                • Racing in memory writ
                  nou

                  well and what about this. i need check error in kernel so i create error buffer.

                  can i do this?

                  if(is error happend)err[0] = 1;

                  i mean it will be there that 1 for sure? on 5xxx card i can use atomics but what about 4xxx. will it work reliable?

                    • Racing in memory writ
                      LeeHowes

                      nou, your code is fine.

                      On the other hand if your code said this:

                      if(is error happend)err[0] = 1; else err[0] = 0;

                      Then the output is undefined unless "is error happend" is the same across every work item on the device.

                       

                        • Racing in memory writ
                          rotor

                           

                          Originally posted by: LeeHowes nou, your code is fine.

                           

                          On the other hand if your code said this:

                           

                          if(is error happend)err[0] = 1; else err[0] = 0;

                           

                          Then the output is undefined unless "is error happend" is the same across every work item on the device.

                           

                           

                           

                          Hi Lee,

                          Can you help me explain with more details why the result is undefined unless "is error happend"? What I think is if the "is error not happend" then it will jump to the else branch which will yield err[0]=0.

                           

                            • Racing in memory writ
                              LeeHowes

                              You are right, of course. It will jump to the else branch.

                               

                              What I mean is not that the result is undefined unless "is error happened" is true. I mean that if "is error happened" is true for one thread, and not true for another thread then in what order do those threads write to err[0]? If the successful one writes a 0 first, and the one with an error writes a 1 afterwards then you end up with a 1. What if the 0 write happened after the 1? Then you'd think there'd been no error because err[0] would be 0 even though one of the threads recorded an error. You could only guarantee the answer if "is error happened" was either true for every thread or false for every thread.

                               

                              This is true whether you have the 4 or so threads on a multicore CPU or the 40 or so threads you're likely to have running on a 5870. You might be lucky that each lane of a SIMD does have a predictable ordering through the memory controller, I don't know if that's the case, but between wavefronts you definitely won't have a predictable ordering so there would be no way to know whether the last write would be a 1 or a 0.

                      • Racing in memory writ
                        MicahVillmow
                        nou,
                        If all work-items write the same value, then it doesn't matter which work-item wins on the race condition. So there should be no issue on any device.
                          • Racing in memory writ
                            rotor

                            Thanks all for your answer. I'm sorry for the late reply, just back in town after a long trip.

                            I think Micah is true, if racing in writing happen no matter who wins but we got an unexpected result. So based on my experience now we just have to reorganize our access scheme so that there now overlap in be-written memory space.

                            I talked to several people and it's likely there no hardware or software solution to handle memory write racing recently.

                            Thanks,

                            Roto