Archives Discussions

laobrasuca · ‎06-10-2011

hi all,

imagine that I have a kernel like:

__kernel void sum1 (__global int* a, __global int* b, __global int* c)

{

int tid = 4*get_global_id(0);

for (int i = 0; i < 4; ++i)

c[tid+i] = a[tid+i] + b[tid+i];

}

and I want to vectirize it. So my question is, will this new kernel

__kernel void sum2 (__global int* a, __global int* b, __global int* c)

{

int tid = get_global_id(0);

vstore(vload4(tid,a) + vload4(tid,b), tid, c);

}

run as fast as this one (with a and b converted to cl_int4 on the host side):

__kernel void sum3 (__global int4* a, __global int4* b, __global int4* c)

{

int tid = get_global_id(0);

c[tid] = a[tid] + b[tid];

}

? I mean, do I need to change my host code to vectorize all my arrays (implying additional copies depending on the data structure for the conversion scalar->vector types) and modify the kernel input types or vload/vstore will be equally optimized in terms of memory read/writes and vectorized computations/registers use?

MicahVillmow · ‎06-10-2011

Yes, there is a performance difference.
vload is scalar loads of a vector size from a scalar aligned pointer.
intn loads are vector loads of a vector size from a vector aligned pointer.

Because the pointer to vload can be unaligned, on hardware that does not support unaligned loads, there is no load coalescing.

Archives Discussions

is there any performance difference between int* + vload and intn use?