Greetings!,
I am new to the forum, I am an FPGA developer who is becoming interested in the idea of GPU/FPGA co-processing using openCL.
I am developing a very high throughput and low latency system for data acquisition and processing without the intervention of the CPU for the time critical path. our new system is much like the fig3. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7111377 but instead of the data going to system memory it goes back to FPGA via directGMA. therefore creating a closed loop (acquisition, processing, transmission to subsystems) with little CPU effort.
currently we are having some questions which I would like if some of you could give us some feedback and suggestions.
1) what is the correct way to signal an event that some data have been copied in an specific location in the GPU to then trigger the processing kernels and subsequent reply?
(previous to this, we have declared some memory as accessible to external devices and then sent the memory address to the FPGA and this one has copied some data to it)
2) we have tried to pull data allocated in the GPU memory from the FPGA but it (GPU) answers that this request is not supported. this is why I need 1). Maybe I am doing something wrong, is it supported? or could I enable it somehow?
3) Is there a way to increase the TLP payload size for DMA transfers from GPU to FPGA? at the moment it is only 64 Bytes and we would like to increase it to the maximum negotiated payload by the bus (128 or 256) the FPGA to GPU direction uses 256 B
I think these are all my questions for now, I would like also to follow up on some other posts that are already in the forum, please give me access!.
Cheers,
Luis