tomhammo

vector vs scalar memory operations

Discussion created by tomhammo on Jan 20, 2010
Latest reply on Jan 21, 2010 by Fr4nz

from the performance guide:

 

"The GPU memory subsystem can coalesce multiple concurrent accesses to global memory, provided the memory addresses increase sequentially across the work-items in the wavefront and start on a 128-byte alignment boundary."

so code like the following would be most efficient:

float* data = ...



data[get_global_id(0)] = ...

... = data[get_global_id(0)]

however, does this also apply to vector data?

 

 

float4* data = ...

 

data[get_global_id(0)] = ...

... = data[get_global_id(0)]



regards,

- Tom

Outcomes