We're developing software that uses a PCI data acquisition card to read blocks of data (records) from an external instrument. These records are transferred to a Radeon Pro WX7100 using "DirectGma", where a kernel processes the data. The results are then transferred to RAM for the client software to deal with.
The external instrument generates a fixed number of records, but it's possible that some can go missing for various reasons, so the actual number of records that get transferred to the Radeon Pro can be less than what our program is expecting.
The code has a "processing loop" that iterates "num_records_to_acquire" times; in this loop we call clEnqueueWaitSignalAmd, execute the kernel, then read the buffer back to RAM. The problem is that when records go missing, it will hang on the call to clEnqueueWaitSignalAmd. Other than killing the thread that the loop is running in, are there any other ways to handle this scenario?
Hi Andrew,
Thank you for your query. I've forwarded it to our OpenCL team if they can suggest any better solutions to handle this scenario. Once I get any reply, I'll share with you.
Thanks.
As I've come to know, this is not defined behavior for the extension, thus current OpenCL runtime doesn’t have any special logic to terminate the wait.
Thanks,
Thanks for the reply.
Once OpenCL is "stuck" in this way, we then attempt to release the objects:
clReleaseDevice()
clReleaseCommandQueue()
clReleaseContext()
clReleaseEvent()
We usually get a BSOD on the second line though. Would releasing these objects in a different order help, or is there really no way around the problem?
I don't think releasing queue or device will help in this case. Because as per the OpenCL spec, a command-queue is deleted once all commands queued to the command-queue are finished. Similarly, a device should be deleted once attached command-queues to the device are released. But in this case, the clEnqueueWaitSignalAmd command is still active and waiting for a signal/value to be written by the clEnqueueWriteSignalAMD call.
Thanks.
Sorry to resurrect an old thread, but here's another scenario - the "source" device sends DMA buffers one after the other with an incrementing signal marker: 1,2,3,... Meanwhile the GPU waits for each buffer to arrive using clEnqueueWaitSignalAMD (before processing that buffer on the GPU).
What would happen if the GPU is blocking on a command clEnqueueWaitSignalAMD(...3...), but for some reason the next DMA buffer to arrive has a signal marker of 4 (i.e. 3 was skipped)? Would the GPU end up indefinitely "stuck" waiting for signal marker 3 to arrive?
As mentioned here cl_amd_bus_addressable_memory , clEnqueueWaitSignalAMD API "instructs the OpenCL to wait until <value> is written to <buffer> before issuing the next command".
So, as per my understanding, when clEnqueueWaitSignalAMD() is called on a buffer with a marker value, the runtime expects a corresponding clEnqueueWriteSignalAMD() call on the same buffer with the same marker value to unblock it. For example, clEnqueueWaitSignalAMD(..., bufferA, marker1,...) will be unblocked by clEnqueueWriteSignalAMD(..., bufferA, marker1,...).
These two APIs are used for synchronization and the marker helps to establish this synchronization.
Thanks.