6 Replies Latest reply on Sep 3, 2015 8:49 AM by dmitryk

    DirectGMA between a FPGA and GPU



      I'm working on the direct communication between an FPGA PCIe board (Altera) and a GPU (nVidia for the moment). We easily managed to make the FPGA write in the memory exposed by the GPU, where a kernel is polling to detect new data, but we couldn't make the GPU write in the memory of the FPGA. Actually, we couldnt find any function allowing us to give the GPU a FPGA physical address it could use (CUDA wants a address in the system memory and can't handle memory on the PCIe bus). So today, the FPGA has to read in the memory of the GPU when the CPU tells it to do so, which is quite inefficient and leads to other problems (optimal synchronization is difficult).

      DirectGMA seem to be exactly what we need as the DirectGMA page says that it allows a GPU to write directly in the memory of a device supporting DirectGMA, and beside that, we are very interested in using OpenCL instead of CUDA.

      My question is: what does need a FPGA board to be considered as a device supporting DirectGMA? Is it possible to use the same polling mecanism on the GPU side to detect fresh data? We would like not to rely on interruptions as our application is very latency sensitive.



        • Re: DirectGMA between a FPGA and GPU

          Welcome. I have white listed you and moved this question into the DGMA forum.

          • Re: DirectGMA between a FPGA and GPU

            Hi sainterme,


            DirectGMA makes it possible for GPU to write directly into FPGA memory if FPGA driver conforms to DirectGMA requirements. Namely it has to:


            • Allocate a buffer in visible memory which GPU can access via DirectGMA
            • Be able to provide entire address range of this buffer to the app (the app later passes that to AMD driver)
            • Accept the address range of GPU memory and make sure to write only into that address range (this is only needed if FPGA wants to write into GPU memory)
            • All allocated buffers must be page aligned
            • Conform to transfer synchronization rules (see below)


            From OpenCL point of view the buffer allocated in FPGA memory should be created with CL_MEM_EXTERNAL_PHYSICAL_AMD flag.

            As for polling: DirectGMA uses the concept of markers to correctly synchronize data transfer. Each of DirectGMA buffers has an associated marker. This is just a memory location to which FPGA or GPU writes after the transfer finished. PCI-E write ordering guarantees correct synchronization in this case. In OpenCL it is achieved by clEnqueueWriteSignalAMD/clEnqueueWaitSignalAMD calls. FPGA driver has to provide similar interface to establish correct ordering of operations.

            1 of 1 people found this helpful
              • Re: DirectGMA between a FPGA and GPU

                hi dmitryk, thanks for your very precise answer. You convinced me in 10 lines to buy a firepro.


                I have to add that our OS is Linux (cent OS or Debian): the FPGA memory is made accessible "through a BAR". The FPGA PCIe driver provides a mmap function based on remap_pfn_range, so the user app can get a pointer to access the FPGA memory. Is this kind of pointer well suited for the GPU driver to retrieve everything it needs (physical addresses I guess) to initiate a DirectGMA mecanism?


                About the markers used for synchronization: As I undestand, they act as semaphore or am I wrong? Are they located in system memory, in which case the CPU has a role in the synchronization, or are they in the FPGA memory (when the GPU is the sender) and in the GPU memory (when the FPGA is the sender)? As our application (real time control) is very latency sensitive, we would like the fpga and the GPU do their jobs without the host CPU interfering, ideally in a infinite loop (we work on a stream of images).




                  • Re: DirectGMA between a FPGA and GPU

                    Basically GPU driver needs physical bus address of the buffer, so mapped user space pointer is not enough to get by. On the other hand GPU driver provides bus addresses for DirectGMA buffers created in GPU memory with OpenCL using cl_bus_address_amd structure, so you can pass that to FPGA driver.


                    The markers are located in remote memories: FPGA memory if GPU is sender and GPU memory if the sender is FPGA, so no host interaction is required except submitting a marker value to command buffer. Let's say FPGA writes to GPU memory and you want to synchronize the transfer:


                    FPGA part:

                    1) You are initiating the transfer: FPGA_WriteBuffer(buf);

                    2) You are writing value X into the marker: FPGA_WriteMarker(buf, X);


                    GPU part:

                    1) Send rendering commands not related to DirectGMA buffer

                    2) Prepare to send commands dependent on DirectGMA buffer, make sure it is synced passing the value X: clEnqueueWaitSignalAMD(buf, X);

                    3) This blocks GPU until it actually sees value X written at marker location, which means transfer has been completed (PCI-E ordering)

                    4) Send commands which use DirectGMA buffer


                    So working in an infinite loop is pretty much possible, you can  just use frame counter as a marker value. You can contact us directly FirePro.developers@amd.com to get more details.