global memory optimization - don´t get it

Discussion created by diapolo on Apr 25, 2010
Latest reply on May 12, 2010 by LeeHowes

I currently read through the AMD PDF ATI_Stream_SDK_OpenCL_Programming_Guide and have got some questions on the global memory optimization section in there.

The guide sais: "Note that the memory segments
indexed through base addresses A0 to An are not required to line up
sequentially; for optimal performance, they must be aligned to 128 bytes and must not overlap."

My kernel currently uses a 256MB array which holds uint2. I made sure, that the host memory is alligned and reserved it via: cl_uint *searchStrings = (cl_uint*)_aligned_malloc(sizeof(cl_uint2) * numCombinations, 16);

That array is passed to my kernel via a write buffer (mem object). And there it is accessed read-only 8 times for each work-item (value is used in an addition).

But I´m really unsure how to align to 128 Bytes and what it really means.