Is this just the first access of these 2 buffers? How do the rest of CopyBuffers look like?
CreateBuffer seems to be opportunistic and there could be some buffer initialization going on.
Finally, a question that I had all along these performance threads. Could any other processes run at the same time? If this is a display card, could it be that display rendering is responsible for some of these performance variations?
When we create the buffers we also enqueue a write to them to make sure they are actually created before we start using them, so this is not the case of lazy initialization.
As for other processes running, that's not the case. That's an 8 core machine, the only thing running is the process running the OpenCL host code, no other time consuming process is running.
As for a display card, that's also not the issue. The display is using the Intel iGPU while the AMD GPU is used solely for OpenCL compute.
Thanks for the clarifications. Nicely controlled environment.
Have you verified profiling from your host side? It doesn't have CodeXL's resolution, but time needed to complete CopyBuffer should equal sum
of queueing and execution in CodeXL's profiler.
I have also seen weird execution times in my programs under CodexL profiler, even violating single queue prioritization.
I have even seen event completion before kernel has even started, so I assumed that this is a profiler issue.
(CodeXL is *very* buggy. Have given up raising tickets about it :-()
One last question: Is that the only CopyBuffer that looks like that, or are there more?