cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

rolandman99
Journeyman III

Matrix add problem

Hi,

I tried matrix addition (h=1024 , w=1024). The global work size:  {h/4, w/4}. I use 2 dimensional NDRange.

 

The kernel code:

__kernel void add(__global *float4 c, __global float4 a, __global float4 b, int h, int w)

{

    int i = get_global_id(0);

    int j = get_global_id(1);

    w = w/4;

    c[i*w+j] = a[i*w+j] + b[i*w+j];

}

The problem is, the result is not correct. It did not add all the elements in the matrix. Can someone point out what's wrong with the kernel code?.

 

0 Likes
3 Replies
pulec
Journeyman III

I think that the problem lies in the size of workgroup - shouldn't it be {h, w/4} instead? Your alogrithm and size calculates only top quater.

EDIT: And another thing - the expression c[i*w+j] = a[i*w+j] + b[i*w+j] should have i and j variables swapped: c[j*w+i] = a[j*w+i] + b[j*w+i] (If I expect that i is horizontal and j vertical coordinate.)

0 Likes
nou
Exemplar

try first write simple single float version. and you don't need pass h,w. use get_global_size()

0 Likes
fpaboim
Journeyman III

I guess that if you use row major ordering the algorithm is correct, but if you're using float4 data type you have to use float4 indexing. Maybe this will work?

 

__kernel void add(__global *float4 c, __global float4 a, __global float4 b)

{

    int i = get_global_id(0);

    int j = get_global_id(1);

    w = get_global_size(0);

    c[i*w+j].x = a[i*w+j].x + b[i*w+j].x;

    c[i*w+j].y = a[i*w+j].y + b[i*w+j].y;

    c[i*w+j].z = a[i*w+j].z + b[i*w+j].z;

    c[i*w+j].w = a[i*w+j].w + b[i*w+j].w;

}

 

0 Likes