In my application, I have a processing thread that enqueues an OpenCL kernel that writes to a ID3D11Texture2D object.
Everything works fine in terms of correctness. I can successfully acquire the shared OpenCL-Direct3D11 resource via clEnqueueAcquireD3D11ObjectsKHR and render the texture on a different thread using Direct3D11 API.
I noticed that when the application has a slightly bigger workload (more OpenCL kernel launching from multiple queues, more Direct3D11 drawing) then all relevant API calls scale pretty nicely in terms of performance, they take approximately the same time as they used to. Well, all except one - the call to clEnqueueAcquireD3D11ObjectsKHR. It seems as if this call actually blocks on the host side trying to acquire some shared resource. This does not make sense as the acquisition should happen asynchronously on the device side.
So my questions is:
- what can affect the time it takes for clEnqueueAcquireD3D11ObjectsKHR to complete? what exactly can cause this function to block the calling thread for a considerable amount of time (3-20ms)?
Hardware is AMD Radeon Pro WX7100
OS: Windows 10