3 Replies Latest reply on Dec 19, 2010 6:21 PM by fpaboim

    Matrix add problem

    rolandman99

      Hi,

      I tried matrix addition (h=1024 , w=1024). The global work size:  {h/4, w/4}. I use 2 dimensional NDRange.

       

      The kernel code:

      __kernel void add(__global *float4 c, __global float4 a, __global float4 b, int h, int w)

      {

          int i = get_global_id(0);

          int j = get_global_id(1);

          w = w/4;

          c[i*w+j] = a[i*w+j] + b[i*w+j];

      }

      The problem is, the result is not correct. It did not add all the elements in the matrix. Can someone point out what's wrong with the kernel code?.

       

        • Matrix add problem
          pulec

          I think that the problem lies in the size of workgroup - shouldn't it be {h, w/4} instead? Your alogrithm and size calculates only top quater.

          EDIT: And another thing - the expression c[i*w+j] = a[i*w+j] + b[i*w+j] should have i and j variables swapped: c[j*w+i] = a[j*w+i] + b[j*w+i] (If I expect that i is horizontal and j vertical coordinate.)

          • Matrix add problem
            nou

            try first write simple single float version. and you don't need pass h,w. use get_global_size()

            • Matrix add problem
              fpaboim

              I guess that if you use row major ordering the algorithm is correct, but if you're using float4 data type you have to use float4 indexing. Maybe this will work?

               

              __kernel void add(__global *float4 c, __global float4 a, __global float4 b)

              {

                  int i = get_global_id(0);

                  int j = get_global_id(1);

                  w = get_global_size(0);

                  c[i*w+j].x = a[i*w+j].x + b[i*w+j].x;

                  c[i*w+j].y = a[i*w+j].y + b[i*w+j].y;

                  c[i*w+j].z = a[i*w+j].z + b[i*w+j].z;

                  c[i*w+j].w = a[i*w+j].w + b[i*w+j].w;

              }