2 Replies Latest reply on Mar 8, 2013 3:08 AM by himanshu.gautam

    OpenCL compilation for CPU crashes.

    miktis
      I have a code that produces correct results when I test it with my Radeon HD7870.
      It also compiles for GPUs in KernelAnalyser2.
      The kernel myTask is used as a task (clEnqueueTask).
      The code computes y=L*R*x with L and R being sparse matrices.
      The code is not efficient but this is not my point.

      I use AMD APP 2.8 and CAT 13.1 on Windows 7 Ultimate 64bit with i7-3820.
      When I select CPU as device, the compiler crashes inside clBuildProgram (in amdocl64.dll).
      KernelAnalyser2 also crashes.

      I have made some observations with KernelAnalyser2:
            1. The code compiles if I comment out myKernel.
            2. The code compiles if I uncomment the line i = get_global_id(0)
            3. The code compiles if I don't use the second for loop (for(i = 0; i < rows; i++)),
                even though results are not correct.
      The above apply also when I compile using my software.
      Could anyone try to repeat my findings with the KernelAnalyser2?

      The code is as follows:
      __kernel void myTask(__global float *R, __global int *RIdx, __global int *RPtr,
            __global float *L, __global int *LIdx, __global int *LPtr,
            __global float *x, __global float *y, int rows, int cols)
      {
            __local float z[1024];
            int i = 0;
            //i = get_global_id(0);
            for(; i < cols; i++)
            {
                 float acc = 0.0f;
                 for (int j = RPtr[i]; j < RPtr[i+1]; j++)
                 {
                      acc += R[j] * x[RIdx[j]];
                 }
                 z[i] = acc;
            }
            for(i = 0; i < rows; i++)
            {
                 float acc = 0.0f;
                 for (int j = LPtr[i]; j < LPtr[i+1]; j++)
                 {
                      acc += L[j] * z[LIdx[j]];
                 }
                 y[i] = acc;
            }
      }
      __kernel void myKernel(__global float *a, __global float *b, __global float *c)
      {
            int i = get_global_id(0);
            a[i] = b[i] + c[i];
      }