cancel
Showing results for
Did you mean:

# Archives Discussions

Journeyman III

Hi,

I tried matrix addition (h=1024 , w=1024). The global work size:  {h/4, w/4}. I use 2 dimensional NDRange.

The kernel code:

__kernel void add(__global *float4 c, __global float4 a, __global float4 b, int h, int w)

{

int i = get_global_id(0);

int j = get_global_id(1);

w = w/4;

c[i*w+j] = a[i*w+j] + b[i*w+j];

}

The problem is, the result is not correct. It did not add all the elements in the matrix. Can someone point out what's wrong with the kernel code?.

3 Replies
Journeyman III

I think that the problem lies in the size of workgroup - shouldn't it be {h, w/4} instead? Your alogrithm and size calculates only top quater.

EDIT: And another thing - the expression c[i*w+j] = a[i*w+j] + b[i*w+j] should have i and j variables swapped: c[j*w+i] = a[j*w+i] + b[j*w+i] (If I expect that i is horizontal and j vertical coordinate.)

Exemplar

try first write simple single float version. and you don't need pass h,w. use get_global_size()

Journeyman III

I guess that if you use row major ordering the algorithm is correct, but if you're using float4 data type you have to use float4 indexing. Maybe this will work?

__kernel void add(__global *float4 c, __global float4 a, __global float4 b)

{

int i = get_global_id(0);

int j = get_global_id(1);

w = get_global_size(0);

c[i*w+j].x = a[i*w+j].x + b[i*w+j].x;

c[i*w+j].y = a[i*w+j].y + b[i*w+j].y;

c[i*w+j].z = a[i*w+j].z + b[i*w+j].z;

c[i*w+j].w = a[i*w+j].w + b[i*w+j].w;

}