Archives Discussions

Atmapuri · ‎05-07-2011

Hi!

Looking at the generic kernel for multiplying two vectors:

__kernel void mul(__global const float* a, __global const float* b, __global float* c)

{

gid = get_global_id(0);

c[gid] = a[gid]*b[gid];

}

Is it possible here to implement coalescing reads/writes? How? If I get the indexing right, within each compute unit the local_id is going in increments one and this for all compute units concurrently?

Thanks!
Atmapuri

rick_weber · ‎05-09-2011

This kernel should correctly coalesce all loads and stores, assuming your work group size is large enough (64 is a good rule of thumb on AMD GPUs). One way of thinking about coalescing is the ith thread in a work group should read or write the ith element in an array (in general, other cases work too, but this is the simplest to analyze). Since the 0th global work item has local id 0, global work item 1 has local id 1, etc, the ith thread is reading the ith element.

himanshu_gautam · ‎05-13-2011

I would suggest you to use float4 instead of float. That generally generates a better coalesced fetch & write requests.

Archives Discussions

Coalescing reads and writes!