AnsweredAssumed Answered

possible bug found: *single* threaded race condition with clSetKernelArg + clEnqueueNDRangeKernel

Question asked by madshi on Jan 6, 2014
Latest reply on Jun 18, 2014 by dipak

First of all let me say a *BIG* thanks for finally adding support for the OpenCL 1.2 Direct3D9 interop extension. I've been waiting for this for years! And it seems to work just fine. Finally I can use OpenCL in my DirectShow video renderer. So I'm pretty happy...

 

Found one potential AMD OpenCL bug. Consider the following code:

 

[code]  bool success = (clSetKernelArg(KernelSplitNv12, 0, sizeof(cl_mem), &nv12plane0) == CL_SUCCESS) &&

                 (clSetKernelArg(KernelSplitNv12, 1, sizeof(cl_mem), &YSurfaceCL) == CL_SUCCESS) &&

                 (clEnqueueNDRangeKernel(ClDxvaQueue, KernelSplitNv12, 2, NULL, globalWsY, NULL, 0, NULL, NULL) == CL_SUCCESS) &&

                 (clSetKernelArg(KernelSplitNv12, 0, sizeof(cl_mem), &nv12plane1   ) == CL_SUCCESS) &&

                 (clSetKernelArg(KernelSplitNv12, 1, sizeof(cl_mem), &CbCrSurfaceCL) == CL_SUCCESS) &&

                 (clEnqueueNDRangeKernel(ClDxvaQueue, KernelSplitNv12, 2, NULL, globalWsCbCr, NULL, 0, NULL, NULL) == CL_SUCCESS);[/code]

 

I'm using this to copy the Y and CbCr channels of a NV12 D3D9 DXVA surface to separate D3D9 textures/surfaces for further processing. This code works fine most of the time. However, once in a while this code produces corrupted image output. After some digging I found that adding a call to "clFinish()" seems to fix the issue:

 

[code]  bool success = (clSetKernelArg(KernelSplitNv12, 0, sizeof(cl_mem), &nv12plane0) == CL_SUCCESS) &&

                 (clSetKernelArg(KernelSplitNv12, 1, sizeof(cl_mem), &YSurfaceCL) == CL_SUCCESS) &&

                 (clEnqueueNDRangeKernel(ClDxvaQueue, KernelSplitNv12, 2, NULL, globalWsY, NULL, 0, NULL, NULL) == CL_SUCCESS) &&

                 (clFinish(ClDxvaQueue) == CL_SUCCESS) &&

                 (clSetKernelArg(KernelSplitNv12, 0, sizeof(cl_mem), &nv12plane1   ) == CL_SUCCESS) &&

                 (clSetKernelArg(KernelSplitNv12, 1, sizeof(cl_mem), &CbCrSurfaceCL) == CL_SUCCESS) &&

                 (clEnqueueNDRangeKernel(ClDxvaQueue, KernelSplitNv12, 2, NULL, globalWsCbCr, NULL, 0, NULL, NULL) == CL_SUCCESS);[/code]

 

It seems that sometimes the clSetKernelArg() calls for the 2nd kernel call are already applied before the first kernel has run through, messing everything up. I can understand why this could happen. But the OpenCL documentation doesn't mention this anywhere. It only mentions that there could be a race condition with clSetKernelArg() when using multi-threading. So this looks like a bug?

 

Thoughts?

Outcomes