Hi Tomer,
Thanks for the clarifications. Nicely controlled environment.
Have you verified profiling from your host side? It doesn't have CodeXL's resolution, but time needed to complete CopyBuffer should equal sum
of queueing and execution in CodeXL's profiler.
I have also seen weird execution times in my programs under CodexL profiler, even violating single queue prioritization.
I have even seen event completion before kernel has even started, so I assumed that this is a profiler issue.
(CodeXL is *very* buggy. Have given up raising tickets about it :-()
One last question: Is that the only CopyBuffer that looks like that, or are there more?