When running the application timeline trace (via CodeXL 1.5.xxx) it shows that there are huge gaps between the kernel executions (e.g., 10-12 ms). I am working on a real-time application where we're streaming data in and out of the GPU and these gaps are causing the processing to go slower than real-time.
To give you some more information, we're using the Firepro S9150 in a Linux x64 environment. There is a series of 7 kernel executions, the first six provide inputs to the final "stage". After the final stage, the process repeats; this is where the gap usually occurs. The inputs have been copied concurrently during the previous iteration's execution and I can see that the input has completed properly.
The kernel executions are strung together (timed?) using events. For example, each kernel can't execute before the inputs are complete. And the final stage is added to the same queue as the reset of the kernels so it is the last to execute (thus, all of its inputs are prepared and in memory by the time it starts).
I have looked at the profiler and I don't see any reasons as to why the kernel is failing to launch (or, rather, anything that is changing which could trigger it to launch).