I have few questions about this profiler rresults:
1) How is possible to get simultaneous writes in 2 queues? (to same device). There are 3 writes that overlap in time to the same device.
2) why so big gap between 2 kernel executions? What prevents to execute next kernel? And, as result of that gap, why so big ReadBuffer API call time?
EDIT: here write overlapping seen more clearer: