AFAIk before APP Profiler always serialized memory transfers so even when i used zero copy buffers (and APP Profiler confirmed that there are zero copy ones) memory transfers and kernel execution don't overlap. Was this already fixed in Code XL 1.1?
CodeXL GPU Profiler/APP Profiler doesn't change program behavior.
When you do zero copy (zero duration for map/unmap), profiler can't display data transfer bar on the timeline as we can't bracket this type of memory access unlike GPU activities. But kernel execution and memory transfer overlap under the hood.
To visualize it, you can use CLPerfMarker.
You can add Begin and End around your memory transfer CPU/host code.
Another type of kernel execution and memory transfer overlap is through DMA engine. OpenCL runtime can't time DMA transfer. Therefore, if profiling for the command queue is enabled, DMA engine is disabled and overlap won't happen. This has nothing to do with profiler.
So, if I create own queue in app with profiling enabled, this queue can't do DMA at all ?
Worth to note in APP SDK manual then perhaps.
And regarding zero copy - profiler still displayed some data transfer tab before, with APP Profiler.
Will check how CodeXL behaves and report back.
only async DMA is disabled.
Retrieving data ...