1 Reply Latest reply on Jun 10, 2011 3:05 PM by MicahVillmow

    is there any performance difference between int* + vload and intn use?

    laobrasuca

      hi all,

      imagine that I have a kernel like:

      __kernel void sum1 (__global int* a, __global int* b, __global int* c)

      {

      int tid = 4*get_global_id(0);

      for (int i = 0; i < 4; ++i)

          c[tid+i] = a[tid+i] + b[tid+i];

      }

      and I want to vectirize it. So my question is, will this new kernel

      __kernel void sum2 (__global int* a, __global int* b, __global int* c)

      {

      int tid = get_global_id(0);

      vstore(vload4(tid,a) + vload4(tid,b), tid, c);

      }

      run as fast as this one (with a and b converted to cl_int4 on the host side):

      __kernel void sum3 (__global int4* a, __global int4* b, __global int4* c)

      {

      int tid = get_global_id(0);

      c[tid] = a[tid] + b[tid];

      }

      ? I mean, do I need to change my host code to vectorize all my arrays (implying additional copies depending on the data structure for the conversion scalar->vector types) and modify the kernel input types or vload/vstore will be equally optimized in terms of memory read/writes and vectorized computations/registers use?