In my application, I have a processing thread that enqueues an OpenCL kernel that writes to a ID3D11Texture2D object.
Everything works fine in terms of correctness. I can successfully acquire the shared OpenCL-Direct3D11 resource via clEnqueueAcquireD3D11ObjectsKHR and render the texture on a different thread using Direct3D11 API.
I noticed that when the application has a slightly bigger workload (more OpenCL kernel launching from multiple queues, more Direct3D11 drawing) then all relevant API calls scale pretty nicely in terms of performance, they take approximately the same time as they used to. Well, all except one - the call to clEnqueueAcquireD3D11ObjectsKHR. It seems as if this call actually blocks on the host side trying to acquire some shared resource. This does not make sense as the acquisition should happen asynchronously on the device side.
So my questions is:
Hardware is AMD Radeon Pro WX7100
OS: Windows 10
Thank you for reporting it. I'll check with the OpenCL runtime team for their feedback about this observation. Meanwhile, could you please provide a reproducible test-case and share the driver information?
I'll be happy to provide you with more details. However, our software is very complex and has many external hardware dependencies. So reproducing the problem on your environment will be close to impossible.
However, here are some more details that might help:
I've got the feedback from the OpenCL runtime team. As per their reply, it is expected that clEnqueueAcquireD3D11ObjectsKHR may block the calling thread for some time. If CL_CONTEXT_INTEROP_USER_SYNC is not specified as CL_TRUE during context creation, clEnqueueAcquireD3D11ObjectsKHR provides the synchronization guarantee and OpenCL runtime has to stall D3D pipeline before starting OpenCL execution.