1 Reply Latest reply on Nov 28, 2013 11:52 PM by jmaxg3

    Large submit->start delay in OpenCL kernel on AMD APU (running on GPU)


      I'm seeing massive delays between a kernel being submitted to an AMD GPU and actually executed. My program is doing blocking writes/reads (with blocking=CL_TRUE) to ensure that I/O isn't interfering with the kernel. I then use clGetEventProfilingInfo to get info on kernel queueing, submitting, starting and ending. The data (and code) below shows that the kernel spends about 3 seconds submitted, and then 3 seconds running. In general, it looks like the submitted time scales with the running time. I've looked at a number of forum posts about delays in kernel execution (for instance,http://devgurus.amd.com/thread/166587) but there doesn't seem to be a resolution there. I've checked that the GPU is not in low-power mode. Has anyone else seen this or have suggestions of how to diagnose it?

      Full code is at [C] #include <stdio.h>  #include <stdlib.h>  #include <CL/cl.h>  #include <sys/timeb - Pastebin.com

      Sample run:

        5 write: queued 0.000000 submit 0.023312 start 3296.444778 end 3335.371268 | submitted 3296.421466 running 38.926490

        6 exec: queued 0.021067 submit 78.494703 start 3335.371268 end 6529.140138 | submitted 3256.876565 running 3193.768870

        7 read: queued 0.024849 submit 79.085042 start 6529.140158 end 6578.664028 | submitted 6450.055116 running 49.523870

        8 Overall 6583.000000 ms