I would like to discuss with you about the possibility for transfert sizes >128MB from FPGA to GPU.
One of elegant solution we could see and we call "streaming" is the following by steps :
1)let's say we allocate 2 buffers or 64MB each in the beginning for directGMA.
2) we write 64MB of data in the first buffer from the FPGA
3) we know starts the data transfer from FPGA to second buffer.
In the mean time, we somehow "deallocate" the property of CL_MEM_BUS_ADRESSABLE_AMD in the first buffer, but we keep references to it to use it in future. after this "deallocation", we create a third buffer for directGMA of 64MB.
4) we can repeat here by writing from FPGA to third buffer, deallocating the second,creating a 4th and do on.
5) transfer finished
In the end, we would get so a GPU filled with a given size of data, and this is elegant; as compared to our current solution which is to copy buffers allocated for directGMA to "saving buffers", allocated normally, during the transfert; it avoids data copying on the GPU and so possible memory latency on the GPU for directGMA.
-how to deallocate the directGMA property of a buffer? clRelease and free make it, but buffers can't be used afterward.
- from the example given above, i create a third buffer after having freed the first one, but i get the adress of the first buffer and so its data. Is it possible so to pass out from this 128 MB, in term of address?
So, is this possible?
And one more question, is it possible to see the influence of copies on the gpu on memory bandwith for DirectGMA?
thanks a lot in advance for your support,
basically you can free BUS_ADDRESSABLE a buffer by calling clReleaseMemObject but I would not recommend doing this. After releasing the buffer you would need to create a new one and call clEnqueueMakeBuffersResidentAMD again.
I think it would be easier to allocate 2 buffers and while the FPGA is writing into buffer 2, you can copy the content of buffer 1 into a "regular" CL-Buffer which was not allocated in the PCIE aperture. The further processing on the GPU then can use this buffer and the FPGA can write the next frame into buffer 1.
Basically this is the same approach as you suggested but instead of deallocating the BUS_ADDRESSABLE memory buffers you copy them into "regular" buffers and reuse BUS_ADDRESSABLE buffers to receive the next frame from the FPGA.
thanks for the clarification. It is thus not possible.
This idea of double buffering on the GPU is indeed our actual solution regarding the size problem (which we need to improve regarding the dma engine), and we were asking ourselves on the consequences of GPU copy operations, to classic clBuffers while doing the directGMA transfer, on the transfer speed (as we may reduce the bandwith) (our current state does not let us judge this correclty).
Could you so just tell us theoretically what's happening if let's say the memory bandwith is almost full due to intensive compute operations, and in the mean time, we want to initiate a transfer to GPU with DirectGMA?
thanks for the answer and your work Chris,