Showing results for 
Search instead for 
Did you mean: 

FirePro Development

Journeyman III

DirectGMA documentation

I have been tasked to create a motion control system with a FP64 algorithm at a Radeon-Pro/Instinct HPC in an infinite-GPU-exclusive-loop. Need to access GPU memory via DirectGMA to minimize both latency and latency-variation. Perform calculations and then send out the data via a DA-converter again at the initiative of the GPU. Must create an autonomous infinite-GPU-exclusive-loop with deterministic timing at the usec level.

I have looked around, but I am having difficulty finding any documentation on DirectGMA and how to develop code. Are there any forms of in depth documentation on DirectGMA and how I can access the GPU physical memory to/from a peripheral PCIe device? How does DirectGMA handle the address translation from a virtual memory to GPU physical memory perspective?

Thank you for any help you can offer

3 Replies

Here are some samples showing DirectGMA features in OpenGL, DX11 and OpenCL: 

Here is a related document:


Thank you for your quick reaction.

Are there more detailed documents such as a Programmers manual, Reference manual and perhaps some white papers etc?

Will utilize Linux and OpenCL or HIP to focus on interfacing camera framegrabbers (input) and ethernet interfaces (output).

Does the GPU-framework allow for GPU-exclusive-loops that are infinite in time? I'm asking because some GPU's have built-in watchdogs that kill processes after x seconds.


Sorry, I could not find any other documents. Looks like related old links are broken now. Here is a forum discussion which may be helpful to get some insights on DirectGMA.

To interact with FirePro/RadeonPro GPU via DirectGMA, the 3rd party device/card needs to support DirectGMA as well. You may check with the card vendor if they can provide some samples and other references that demonstrate how to use DirectGMA for their card.

If your target platform is ROCm, I would suggest to check the documents available here:

Please note, ROCm related support is provided at it's GitHub site. Please use the below link to post any query/issue.


Does the GPU-framework allow for GPU-exclusive-loops that are infinite in time?

Sorry, the question is not clear to me. A typical program flow for GPU related host-code may look like below. 

while(1) {

1. Wait for the frame data to be written in buffer

2. Launch a kernel to process the frame data in buffer

3. Once the processing is done, copy the result as required.

4. When the frame buffer is ready to reuse, send a signal to the frame grabber unit to load next frame

5. Display the result or send it to a target unit


P.S: To improve the performance, two buffers can be used as ping-pong buffers. While GPU is busy with one buffer, the other buffer can be used by the frame grabber to store the next frame data.