OpenCL

elad · ‎04-18-2019

In my application, I have a processing thread that enqueues an OpenCL kernel that writes to a ID3D11Texture2D object.

Everything works fine in terms of correctness. I can successfully acquire the shared OpenCL-Direct3D11 resource via clEnqueueAcquireD3D11ObjectsKHR and render the texture on a different thread using Direct3D11 API.

I noticed that when the application has a slightly bigger workload (more OpenCL kernel launching from multiple queues, more Direct3D11 drawing) then all relevant API calls scale pretty nicely in terms of performance, they take approximately the same time as they used to. Well, all except one - the call to clEnqueueAcquireD3D11ObjectsKHR. It seems as if this call actually blocks on the host side trying to acquire some shared resource. This does not make sense as the acquisition should happen asynchronously on the device side.

So my questions is:

what can affect the time it takes for clEnqueueAcquireD3D11ObjectsKHR to complete? what exactly can cause this function to block the calling thread for a considerable amount of time (3-20ms)?

Additional Info:
Hardware is AMD Radeon Pro WX7100
OS: Windows 10

dipak · ‎04-22-2019

Hi elad‌,

Thank you for reporting it. I'll check with the OpenCL runtime team for their feedback about this observation. Meanwhile, could you please provide a reproducible test-case and share the driver information?

Thanks.

elad · ‎04-22-2019

Hi dipak‌,

I'll be happy to provide you with more details. However, our software is very complex and has many external hardware dependencies. So reproducing the problem on your environment will be close to impossible.

However, here are some more details that might help:

Driver: AMD Radeon (TM) Pro WX 7100 Graphics
Version: 24.20.12024.10
In the worst case we have 2 distinct instances of ID3D11DeviceContext.
- one ID3D11DeviceContext instance interop with an OpenCL context, basically renders 2 textures that are written by 2 distinct OpenCL queues.
- the other ID3D11DeviceContext has a totally different task. it periodically fills a ID3D11Buffer with graphical content that is sent to an external FPGA using DirectGMA transfers.
The problem with clEnqueueAcquireD3D11ObjectsKHR seems to appear only when the second (unrelated) ID3D11DeviceContext is working.
Do note that the fact that there are 2 instances of ID3D11DeviceContext is out of necessity. I was not able to create a DirectGMA resident buffer with a ID3D11DeviceContext that interops with OpenCL. but that is a different story.
Removing the call to clEnqueueAcquireD3D11ObjectsKHR seems to solve the problem (although it is supposed to produce an error when CL_CONTEXT_INTEROP_USER_SYNC is CL_TRUE). As for now I do not notice artifacts on the shared texture once synchronization is gone, but I do fear it will appear in some unforeseen scenario.

dipak · ‎04-22-2019

Thank you for sharing the above details. I've already reported it to the concerned team. Once I get their feedback, I'll come back to you.

Thanks.

dipak · ‎04-24-2019

I've got the feedback from the OpenCL runtime team. As per their reply, it is expected that clEnqueueAcquireD3D11ObjectsKHR may block the calling thread for some time. If CL_CONTEXT_INTEROP_USER_SYNC is not specified as CL_TRUE during context creation, clEnqueueAcquireD3D11ObjectsKHR provides the synchronization guarantee and OpenCL runtime has to stall D3D pipeline before starting OpenCL execution.

Thanks.

OpenCL

clEnqueueAcquireD3D11ObjectsKHR blocks for a long time