Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

is there any performance difference between int* + vload and intn use?

hi all,

imagine that I have a kernel like:

__kernel void sum1 (__global int* a, __global int* b, __global int* c)


int tid = 4*get_global_id(0);

for (int i = 0; i < 4; ++i)

    c[tid+i] = a[tid+i] + b[tid+i];


and I want to vectirize it. So my question is, will this new kernel

__kernel void sum2 (__global int* a, __global int* b, __global int* c)


int tid = get_global_id(0);

vstore(vload4(tid,a) + vload4(tid,b), tid, c);


run as fast as this one (with a and b converted to cl_int4 on the host side):

__kernel void sum3 (__global int4* a, __global int4* b, __global int4* c)


int tid = get_global_id(0);

c[tid] = a[tid] + b[tid];


? I mean, do I need to change my host code to vectorize all my arrays (implying additional copies depending on the data structure for the conversion scalar->vector types) and modify the kernel input types or vload/vstore will be equally optimized in terms of memory read/writes and vectorized computations/registers use?

1 Reply

Yes, there is a performance difference.
vload is scalar loads of a vector size from a scalar aligned pointer.
intn loads are vector loads of a vector size from a vector aligned pointer.

Because the pointer to vload can be unaligned, on hardware that does not support unaligned loads, there is no load coalescing.