Archives Discussions

rick_weber · ‎04-19-2011

I'm curious about overlapping data transfers and computation. My understanding is that the DMA engines are independent of kernel execution. In 4.4.5, you state that in the future, if you issue a bunch of independent data transfers and kernels to a queue then flush it, the OpenCL runtime will "keep the GPU busy with kernel execution and DMA transfers." Does this mean AMD's runtime will (in the future) support overlapping computation and transfers from the command queues alone? I'm trying to put together a higher level wrapper for clUtil that supports overlapping on both AMD and NVIDIA devices and does it without switching on each device's platform and doing it in the magical way needed by that particular platform.

Basically, NVIDIA requires that you allocate buffers using CL_MEM_ALLOC_HOST_PTR, and then using clEnqueueMapBuffer() for all your data transfer needs (to create a pinned buffer). You then create two command queues and issue commands to each one.

Will this overlap communication and transfer when DMA is fully implemented in APP?

nou · ‎04-19-2011

you cant get overlaped transfer with in order queue as ther is implicit synchronization between commands. only with two queue or out of order queue you can get overlap execution of commands.

there is TransferOverlap example but it seem that buffer is in host memory so GPU access buffer throught PCIe bus.

rick_weber · ‎04-19-2011

Yeah, I saw that, but it uses a non-standard construct, so I want to avoid using it if possible. Furthermore, the documentation states that writing with that method is slower than copying large buffers if your data isn't sparse (though, it you're overlapping with computation, it may be a net gain either way).

Archives Discussions

Clarification of 4.4.5