Archives Discussions

bulibuta · ‎09-19-2013

My setup creates a context based on the same platform (AMD) with two devices. The GPU and the CPU. Thus two queues.

One algorithm is designed for the GPU and the other for the CPU. Or rather not for the CPU but it's very linear and gets really expensive as the problem dimensions rise thus I decided to run it on the CPU because the GPU takes forever to process with only one work-item.

Thus I want to start the linear algorithm on the CPU queue and I want it to be run on just one core. Like this:

        gws[0] = lws[0] = 1;
        rc = clEnqueueNDRangeKernel(cpu_queue,
            atom, 1, NULL, gws, lws, 0, NULL, &ev[0]);
        if (rc != CL_SUCCESS) {
            goto err;
        }
        clFlush(cpu_queue);
        clWaitForEvents(1, &ev[0]);

This works fine for small matrix dimensions (64x32), but faults for big ones (1024x512 and beyond).

If I switch the same kernel on the gpu queue there are no faults and the numeric results are correct.

Any ideas? Is this not a good way to enqueue the kernel? Why does it work on the GPU but not on the CPU?

himanshu_gautam · ‎09-20-2013

Enqueuing is fine.. I am not finding any fault in that. Probably we have to see how you are handling the same in kernel. \

Please share the system information and the sample code for further assistance.

Archives Discussions

Enqueuing a kernel on the CPU with only one work item