2 Replies Latest reply on May 13, 2011 8:24 PM by himanshu.gautam

    Coalescing reads and writes!



      Looking at the generic kernel for multiplying two vectors:

      __kernel void mul(__global const float* a, __global const float* b, __global  float* c)


          gid = get_global_id(0);

          c[gid] = a[gid]*b[gid];


      Is it possible here to implement coalescing reads/writes? How? If I get the indexing right, within each compute unit the local_id is going in increments one and this for all compute units concurrently?


        • Coalescing reads and writes!

          This kernel should correctly coalesce all loads and stores, assuming your work group size is large enough (64 is a good rule of thumb on AMD GPUs). One way of thinking about coalescing is the ith thread in a work group should read or write the ith element in an array (in general, other cases work too, but this is the simplest to analyze). Since the 0th global work item has local id 0, global work item 1 has local id 1, etc, the ith thread is reading the ith element.