cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

sainterme
Journeyman III

DirectGMA between a FPGA and GPU

Hi,

I'm working on the direct communication between an FPGA PCIe board (Altera) and a GPU (nVidia for the moment). We easily managed to make the FPGA write in the memory exposed by the GPU, where a kernel is polling to detect new data, but we couldn't make the GPU write in the memory of the FPGA. Actually, we couldnt find any function allowing us to give the GPU a FPGA physical address it could use (CUDA wants a address in the system memory and can't handle memory on the PCIe bus). So today, the FPGA has to read in the memory of the GPU when the CPU tells it to do so, which is quite inefficient and leads to other problems (optimal synchronization is difficult).

DirectGMA seem to be exactly what we need as the DirectGMA page says that it allows a GPU to write directly in the memory of a device supporting DirectGMA, and beside that, we are very interested in using OpenCL instead of CUDA.

My question is: what does need a FPGA board to be considered as a device supporting DirectGMA? Is it possible to use the same polling mecanism on the GPU side to detect fresh data? We would like not to rely on interruptions as our application is very latency sensitive.

Regards,

Sainterme.

0 Likes
1 Solution

Basically GPU driver needs physical bus address of the buffer, so mapped user space pointer is not enough to get by. On the other hand GPU driver provides bus addresses for DirectGMA buffers created in GPU memory with OpenCL using cl_bus_address_amd structure, so you can pass that to FPGA driver.

The markers are located in remote memories: FPGA memory if GPU is sender and GPU memory if the sender is FPGA, so no host interaction is required except submitting a marker value to command buffer. Let's say FPGA writes to GPU memory and you want to synchronize the transfer:

FPGA part:

1) You are initiating the transfer: FPGA_WriteBuffer(buf);

2) You are writing value X into the marker: FPGA_WriteMarker(buf, X);

GPU part:

1) Send rendering commands not related to DirectGMA buffer

2) Prepare to send commands dependent on DirectGMA buffer, make sure it is synced passing the value X: clEnqueueWaitSignalAMD(buf, X);

3) This blocks GPU until it actually sees value X written at marker location, which means transfer has been completed (PCI-E ordering)

4) Send commands which use DirectGMA buffer

So working in an infinite loop is pretty much possible, you can  just use frame counter as a marker value. You can contact us directly FirePro.developers@amd.com to get more details.

View solution in original post

0 Likes
6 Replies
jtrudeau
Staff

Welcome. I have white listed you and moved this question into the DGMA forum.

0 Likes
dmitryk
Staff

Hi sainterme,

DirectGMA makes it possible for GPU to write directly into FPGA memory if FPGA driver conforms to DirectGMA requirements. Namely it has to:

  • Allocate a buffer in visible memory which GPU can access via DirectGMA
  • Be able to provide entire address range of this buffer to the app (the app later passes that to AMD driver)
  • Accept the address range of GPU memory and make sure to write only into that address range (this is only needed if FPGA wants to write into GPU memory)
  • All allocated buffers must be page aligned
  • Conform to transfer synchronization rules (see below)

From OpenCL point of view the buffer allocated in FPGA memory should be created with CL_MEM_EXTERNAL_PHYSICAL_AMD flag.

As for polling: DirectGMA uses the concept of markers to correctly synchronize data transfer. Each of DirectGMA buffers has an associated marker. This is just a memory location to which FPGA or GPU writes after the transfer finished. PCI-E write ordering guarantees correct synchronization in this case. In OpenCL it is achieved by clEnqueueWriteSignalAMD/clEnqueueWaitSignalAMD calls. FPGA driver has to provide similar interface to establish correct ordering of operations.

hi dmitryk, thanks for your very precise answer. You convinced me in 10 lines to buy a firepro.

I have to add that our OS is Linux (cent OS or Debian): the FPGA memory is made accessible "through a BAR". The FPGA PCIe driver provides a mmap function based on remap_pfn_range, so the user app can get a pointer to access the FPGA memory. Is this kind of pointer well suited for the GPU driver to retrieve everything it needs (physical addresses I guess) to initiate a DirectGMA mecanism?

About the markers used for synchronization: As I undestand, they act as semaphore or am I wrong? Are they located in system memory, in which case the CPU has a role in the synchronization, or are they in the FPGA memory (when the GPU is the sender) and in the GPU memory (when the FPGA is the sender)? As our application (real time control) is very latency sensitive, we would like the fpga and the GPU do their jobs without the host CPU interfering, ideally in a infinite loop (we work on a stream of images).

Thanks,

Sainterme

0 Likes

Basically GPU driver needs physical bus address of the buffer, so mapped user space pointer is not enough to get by. On the other hand GPU driver provides bus addresses for DirectGMA buffers created in GPU memory with OpenCL using cl_bus_address_amd structure, so you can pass that to FPGA driver.

The markers are located in remote memories: FPGA memory if GPU is sender and GPU memory if the sender is FPGA, so no host interaction is required except submitting a marker value to command buffer. Let's say FPGA writes to GPU memory and you want to synchronize the transfer:

FPGA part:

1) You are initiating the transfer: FPGA_WriteBuffer(buf);

2) You are writing value X into the marker: FPGA_WriteMarker(buf, X);

GPU part:

1) Send rendering commands not related to DirectGMA buffer

2) Prepare to send commands dependent on DirectGMA buffer, make sure it is synced passing the value X: clEnqueueWaitSignalAMD(buf, X);

3) This blocks GPU until it actually sees value X written at marker location, which means transfer has been completed (PCI-E ordering)

4) Send commands which use DirectGMA buffer

So working in an infinite loop is pretty much possible, you can  just use frame counter as a marker value. You can contact us directly FirePro.developers@amd.com to get more details.

0 Likes

Ok, if the GPU driver only needs the physical address of the FPGA buffer, it's even easier for us, and the handling of the markers seem quite straightforward and logical.

Does the FPGA memory have to be real memory (where you can write and read)? In our case it would be more convenient if it was a fifo (seen from the GPU side as write only), as the data are intended to be directly sent to the 10G ethernet through a UDP hard stack.

0 Likes

There shouldn't be any problems with write-only FPGA memory while doing GPU->FPGA transfer

0 Likes