I'm seeing massive delays between a kernel being submitted to an AMD GPU and actually executed. My program is doing blocking writes/reads (with blocking=CL_TRUE) to ensure that I/O isn't interfering with the kernel. I then use clGetEventProfilingInfo to get info on kernel queueing, submitting, starting and ending. The data (and code) below shows that the kernel spends about 3 seconds submitted, and then 3 seconds running. In general, it looks like the submitted time scales with the running time. I've looked at a number of forum posts about delays in kernel execution (for instance,http://devgurus.amd.com/thread/166587) but there doesn't seem to be a resolution there. I've checked that the GPU is not in low-power mode. Has anyone else seen this or have suggestions of how to diagnose it?
5 write: queued 0.000000 submit 0.023312 start 3296.444778 end 3335.371268 | submitted 3296.421466 running 38.926490
6 exec: queued 0.021067 submit 78.494703 start 3335.371268 end 6529.140138 | submitted 3256.876565 running 3193.768870
7 read: queued 0.024849 submit 79.085042 start 6529.140158 end 6578.664028 | submitted 6450.055116 running 49.523870
8 Overall 6583.000000 ms
Follow up: after upgrading from the latest release drivers (v13.4 released on 5/29/2013) to the latest beta drivers (release 11/22/2013) we're no longer seeing this performance issue. This problem occurred on 64-bit Centos using a AMD A10-6700, but if you're seeing this issue and have a different chipset I'd recommend upgrading to the latest beta drivers and seeing if that fixes it.