cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Atmapuri
Journeyman III

Coalescing reads and writes!

Hi!

Looking at the generic kernel for multiplying two vectors:

__kernel void mul(__global const float* a, __global const float* b, __global  float* c)

{

    gid = get_global_id(0);

    c[gid] = a[gid]*b[gid];

}


Is it possible here to implement coalescing reads/writes? How? If I get the indexing right, within each compute unit the local_id is going in increments one and this for all compute units concurrently?

Thanks!
Atmapuri

0 Likes
2 Replies
rick_weber
Adept II

This kernel should correctly coalesce all loads and stores, assuming your work group size is large enough (64 is a good rule of thumb on AMD GPUs). One way of thinking about coalescing is the ith thread in a work group should read or write the ith element in an array (in general, other cases work too, but this is the simplest to analyze). Since the 0th global work item has local id 0, global work item 1 has local id 1, etc, the ith thread is reading the ith element.

0 Likes

I would suggest you to use float4 instead of float. That generally generates a better coalesced fetch & write requests.

0 Likes