laobrasuca

is there any performance difference between int* + vload and intn use?

Discussion created by laobrasuca on Jun 10, 2011
Latest reply on Jun 10, 2011 by MicahVillmow

hi all,

imagine that I have a kernel like:

__kernel void sum1 (__global int* a, __global int* b, __global int* c)

{

int tid = 4*get_global_id(0);

for (int i = 0; i < 4; ++i)

    c[tid+i] = a[tid+i] + b[tid+i];

}

and I want to vectirize it. So my question is, will this new kernel

__kernel void sum2 (__global int* a, __global int* b, __global int* c)

{

int tid = get_global_id(0);

vstore(vload4(tid,a) + vload4(tid,b), tid, c);

}

run as fast as this one (with a and b converted to cl_int4 on the host side):

__kernel void sum3 (__global int4* a, __global int4* b, __global int4* c)

{

int tid = get_global_id(0);

c[tid] = a[tid] + b[tid];

}

? I mean, do I need to change my host code to vectorize all my arrays (implying additional copies depending on the data structure for the conversion scalar->vector types) and modify the kernel input types or vload/vstore will be equally optimized in terms of memory read/writes and vectorized computations/registers use?

Outcomes