I test an addtion of two arrays with long length(for example 40000,e.g. a[40000]+b[40000]), and loop the same addtion many times(for example 100000 times). but this two kernels cost the same time, why ?
my gpu is ATI Radeon HD 5700 Series
// workitem number is 10000 __kernel void add(__global float4 * a, __global float4 * b, __global float4 * c) { int i = get_global_id(0); c=a*b; } // workitem number is 40000 __kernel void add(__global float * a, __global float * b, __global float * c) { int i = get_global_id(0); c=a*b; }