akhal

Inner looping structure with OpenCL

Discussion created by akhal on Jul 31, 2011
Latest reply on Aug 1, 2011 by nou

Hello

I am new to OpenCL and want to parallelize some looping code thats doing lu factorization with the looping structure showed by exact code as below:

    for(int k = 0; k < N-1; k++)
    {
        for(int i = k+1; i < N; i++)
            S[i*N + k] = S[i*N + k] / S[k*N + k];

        for(int j = k+1; j < N; j++)
            for(int i = k+1; i < N; i++)
                S[i*N + j] -= S[i*N + k] * S[k*N + j];
    }

I have done with the simple opencl kernel with single work items (no groping). Thats following:

      int IDx = get_global_id(0);
      int IDy = get_global_id(1);

      for(int k = 0; k < n-1; k++)
      {
        barrier(CLK_GLOBAL_MEM_FENCE);

        if(IDy > k && IDx == k)
            matrix[IDy*n + IDx] = matrix[IDy*n + IDx] / matrix[IDx*n + IDx];

        barrier(CLK_GLOBAL_MEM_FENCE);

        for(int j = k+1; j < n; j++)
        {
            if(IDy > k && IDx == j)
                matrix[IDy*n + IDx] -= matrix[IDy*n + k] * matrix[k*n + IDx];
        }
      }

 

But I dont get correct results when compared to the serial code, this is my personal try for OpenCL kernel and I am still learning how this data parallel scheme in OpenCL works, Can you point out what I am doing wrong in the kernel?

Outcomes