why float and float4 almost cost the same time ?

Discussion created by Fuxianjun on Aug 11, 2010
I test an addtion of two arrays with long length(for example 40000,e.g. a[40000]+b[40000]), and loop the same addtion many times(for example 100000 times). but this two kernels cost the same time, why ?

my gpu is ATI Radeon HD 5700 Series

// workitem number is 10000 __kernel void add(__global float4 * a, __global float4 * b, __global float4 * c) { int i = get_global_id(0); c[i]=a[i]*b[i]; } // workitem number is 40000 __kernel void add(__global float * a, __global float * b, __global float * c) { int i = get_global_id(0); c[i]=a[i]*b[i]; }