4 Replies Latest reply on May 19, 2012 8:14 PM by viscocoa

    Difference in accessing global memory with (int2*) and (int*)?


      I just used a whole day to fix a small problem. My kernel is conceptually like below:



      void myKernel(__global int2* data)


           do something, and get two int values a and b

           data[a].x = something;

           data[b].y = something_else;



      The above code works well on CPUs. On HD5870, in very rare and random cases, one of the two writes to global memory does not work. The data to be written is lost, and the value in data[] is not changed.

      I then changed the code to:



      void myKernel(__global int* data)


           do something, and get two int values a and b

           data[a<<1] = something;

           data[(a<<1)+1] = something_else;



      The above code work well on HD5870!



      It looks like writing to the two components of an int2 simultaneously (maybe from two compute units) causes a conflict, and one of them will overwrite the other. However, writing to int is independent.


      I wonder if this is a bug of the OpenCL implementation.


      The above logic is used in many circumstances like radix sort. Hope it is helpful for those who has the same problem.

        • Re: Difference in accessing global memory with (int2*) and (int*)?

          This is a bug in your code and not in OpenCL as you have a race condition in your writes.


          The reason it works on the CPU is that each work-item is run sequentially, on the GPU they are run in parallel.

          1 of 1 people found this helpful
            • Re: Difference in accessing global memory with (int2*) and (int*)?

              Hi Micah,


              Thank you for your answer. Does OpenCL not allow to write to two components of the same vector from two threads?


              How about local memory? Can I do this in a Kernel?


              __local int2 localData[1024];

              localData[a].x = num0;

              localData[b].y = num1;


              Supposing different threads may write to the same vector?


              Vis Cocoa

                • Re: Difference in accessing global memory with (int2*) and (int*)?

                  Yes it does permit it,but what he is saying the whole problem with your first code is that it does allow permit same writes across multiple threads. I am assuming because it is a vector type, the compiler loads the entire vector from data[a] (both x and y components), modifies the .x component (but leaves the .y value unchanged) then writes both out (whether to<->from cache or not), but if another thread is modifying data[b].y which happens to be the same vector address, it reads in the original unmodified value of .x and writes out it again, cancelling out the first threads results. This can happen both ways. This is a guess of course.


                  I assume the problem is inconsistant across platforms not because of the serial nature of the CPU (it is atleast parallel across multiple cores right? so you could have the same problem?), but rather how the compiler handles loading/storing of vector types on different devices- on the GPU it does not attempt to address a single individual component, but deals with it writes on vector scale, and x,y&z components simply reflect register organisation. On the CPU data[a].x is translated to a single 32bit int load/store).


                  I say this because how else could you explain race to write only conditions (with no dependent reads) resulting in the data not being modified by any of the competing threads?