As I am currently investigating performance variance for one of my clients, it seems that the root cause is a very large variance and slowdown for the clEnqueueCopyBuffer.
Attached is a screenshot where 4bytes copying on the GPU consumes 8ms. That's obviously a performance bug.
And it's not happening on the beginning of the processing so it's not related to any kind of warm up.
Tomer Gal, CTO at OpTeamizer