OpenCL Coalescing To Global Memory

Discussion created by toddwbrownjr on Feb 9, 2010
Latest reply on Mar 1, 2010 by MicahVillmow

Hello all,

     I have an HD 5870 and the ATI Stream V2.0 SDK installed.  I had a question regarding coalescing global memory reads/writes in a kernel.  The documentation says a wavefront is composed of 64 work items and it appears to suggest that 32 work items are processed at one time.  If, in a given half-wavefront, the addresses to global items are not aligned and/or not completely sequential across increasing wavefront IDs, will the hardware make 32 individual global accesses (horrible bandwidth) or will it try to make as few coalesced global reads as necessary to fulfill the half-wavefront request (better bandwidth)?