1 Reply Latest reply on Sep 20, 2013 1:00 AM by himanshu.gautam

    Enqueuing a kernel on the CPU with only one work item

    bulibuta

      My setup creates a context based on the same platform (AMD) with two devices. The GPU and the CPU. Thus two queues.

       

      One algorithm is designed for the GPU and the other for the CPU. Or rather not for the CPU but it's very linear and gets really expensive as the problem dimensions rise thus I decided to run it on the CPU because the GPU takes forever to process with only one work-item.

       

      Thus I want to start the linear algorithm on the CPU queue and I want it to be run on just one core. Like this:

       

              gws[0] = lws[0] = 1;
              rc = clEnqueueNDRangeKernel(cpu_queue,
                  atom, 1, NULL, gws, lws, 0, NULL, &ev[0]);
              if (rc != CL_SUCCESS) {
                  goto err;
              }
              clFlush(cpu_queue);
              clWaitForEvents(1, &ev[0]);

       

      This works fine for small matrix dimensions (64x32), but faults for big ones (1024x512 and beyond).

      If I switch the same kernel on the gpu queue there are no faults and the numeric results are correct.

       

      Any ideas? Is this not a good way to enqueue the kernel? Why does it work on the GPU but not on the CPU?