AnsweredAssumed Answered

OpenCL compilation for CPU crashes.

Question asked by miktis on Mar 6, 2013
Latest reply on Mar 8, 2013 by himanshu.gautam
I have a code that produces correct results when I test it with my Radeon HD7870.
It also compiles for GPUs in KernelAnalyser2.
The kernel myTask is used as a task (clEnqueueTask).
The code computes y=L*R*x with L and R being sparse matrices.
The code is not efficient but this is not my point.

I use AMD APP 2.8 and CAT 13.1 on Windows 7 Ultimate 64bit with i7-3820.
When I select CPU as device, the compiler crashes inside clBuildProgram (in amdocl64.dll).
KernelAnalyser2 also crashes.

I have made some observations with KernelAnalyser2:
      1. The code compiles if I comment out myKernel.
      2. The code compiles if I uncomment the line i = get_global_id(0)
      3. The code compiles if I don't use the second for loop (for(i = 0; i < rows; i++)),
          even though results are not correct.
The above apply also when I compile using my software.
Could anyone try to repeat my findings with the KernelAnalyser2?

The code is as follows:
__kernel void myTask(__global float *R, __global int *RIdx, __global int *RPtr,
      __global float *L, __global int *LIdx, __global int *LPtr,
      __global float *x, __global float *y, int rows, int cols)
{
      __local float z[1024];
      int i = 0;
      //i = get_global_id(0);
      for(; i < cols; i++)
      {
           float acc = 0.0f;
           for (int j = RPtr[i]; j < RPtr[i+1]; j++)
           {
                acc += R[j] * x[RIdx[j]];
           }
           z[i] = acc;
      }
      for(i = 0; i < rows; i++)
      {
           float acc = 0.0f;
           for (int j = LPtr[i]; j < LPtr[i+1]; j++)
           {
                acc += L[j] * z[LIdx[j]];
           }
           y[i] = acc;
      }
}
__kernel void myKernel(__global float *a, __global float *b, __global float *c)
{
      int i = get_global_id(0);
      a[i] = b[i] + c[i];
}

Outcomes