Archives Discussions

tomhammo · ‎01-20-2010

from the performance guide:

"The GPU memory subsystem can coalesce multiple concurrent accesses to global memory, provided the memory addresses increase sequentially across the work-items in the wavefront and start on a 128-byte alignment boundary."

so code like the following would be most efficient:

float* data = ...

data[get_global_id(0)] = ...

... = data[get_global_id(0)]

however, does this also apply to vector data?

float4* data = ...

data[get_global_id(0)] = ...

... = data[get_global_id(0)]

regards,

- Tom

Archives Discussions

vector vs scalar memory operations