Welcome. I have white listed you and moved this question into the DGMA forum.
1 of 1 people found this helpful
DirectGMA makes it possible for GPU to write directly into FPGA memory if FPGA driver conforms to DirectGMA requirements. Namely it has to:
- Allocate a buffer in visible memory which GPU can access via DirectGMA
- Be able to provide entire address range of this buffer to the app (the app later passes that to AMD driver)
- Accept the address range of GPU memory and make sure to write only into that address range (this is only needed if FPGA wants to write into GPU memory)
- All allocated buffers must be page aligned
- Conform to transfer synchronization rules (see below)
From OpenCL point of view the buffer allocated in FPGA memory should be created with CL_MEM_EXTERNAL_PHYSICAL_AMD flag.
As for polling: DirectGMA uses the concept of markers to correctly synchronize data transfer. Each of DirectGMA buffers has an associated marker. This is just a memory location to which FPGA or GPU writes after the transfer finished. PCI-E write ordering guarantees correct synchronization in this case. In OpenCL it is achieved by clEnqueueWriteSignalAMD/clEnqueueWaitSignalAMD calls. FPGA driver has to provide similar interface to establish correct ordering of operations.
hi dmitryk, thanks for your very precise answer. You convinced me in 10 lines to buy a firepro.
I have to add that our OS is Linux (cent OS or Debian): the FPGA memory is made accessible "through a BAR". The FPGA PCIe driver provides a mmap function based on remap_pfn_range, so the user app can get a pointer to access the FPGA memory. Is this kind of pointer well suited for the GPU driver to retrieve everything it needs (physical addresses I guess) to initiate a DirectGMA mecanism?
About the markers used for synchronization: As I undestand, they act as semaphore or am I wrong? Are they located in system memory, in which case the CPU has a role in the synchronization, or are they in the FPGA memory (when the GPU is the sender) and in the GPU memory (when the FPGA is the sender)? As our application (real time control) is very latency sensitive, we would like the fpga and the GPU do their jobs without the host CPU interfering, ideally in a infinite loop (we work on a stream of images).
Basically GPU driver needs physical bus address of the buffer, so mapped user space pointer is not enough to get by. On the other hand GPU driver provides bus addresses for DirectGMA buffers created in GPU memory with OpenCL using cl_bus_address_amd structure, so you can pass that to FPGA driver.
The markers are located in remote memories: FPGA memory if GPU is sender and GPU memory if the sender is FPGA, so no host interaction is required except submitting a marker value to command buffer. Let's say FPGA writes to GPU memory and you want to synchronize the transfer:
1) You are initiating the transfer: FPGA_WriteBuffer(buf);
2) You are writing value X into the marker: FPGA_WriteMarker(buf, X);
1) Send rendering commands not related to DirectGMA buffer
2) Prepare to send commands dependent on DirectGMA buffer, make sure it is synced passing the value X: clEnqueueWaitSignalAMD(buf, X);
3) This blocks GPU until it actually sees value X written at marker location, which means transfer has been completed (PCI-E ordering)
4) Send commands which use DirectGMA buffer
So working in an infinite loop is pretty much possible, you can just use frame counter as a marker value. You can contact us directly FirePro.firstname.lastname@example.org to get more details.
Ok, if the GPU driver only needs the physical address of the FPGA buffer, it's even easier for us, and the handling of the markers seem quite straightforward and logical.
Does the FPGA memory have to be real memory (where you can write and read)? In our case it would be more convenient if it was a fifo (seen from the GPU side as write only), as the data are intended to be directly sent to the 10G ethernet through a UDP hard stack.
There shouldn't be any problems with write-only FPGA memory while doing GPU->FPGA transfer