AnsweredAssumed Answered

HSA: Kernel dispatch latencies (What are your experiences?)

Question asked by h_l on Jan 26, 2016



I'm currently working on reducing the dispatch times of HSA kernels to enable fine-grained offloading.

At the moment I encounter latencies of  about 1 micro-second (doing some tricks, see below for details).


I would be very interested in the experiences of other HSA developers.

- What dispatch latencies do you encounter?

- Did you find some tricks, hacks or optimizations to reduce latencies?


Many thanks in advance.



Setup: In my experiments I use a simple "do nothing" kernel. I disable the interrupt handling (env HSA_ENABLE_INTERRUPT=0 <hsa_app>) and use busy-waiting instead. Further, the iGPUs' frequency is pinned to 720MHz.


A synchronous dispatch of a single kernel takes ~7 micro-seconds (time until the application receives the completion signal, including AQL-enqueue).

Dispatching multiple kernels in batches can hide latencies to some degree: 3.5 microseconds.

Dispatching and running a (busy-wait-) kernel in advance and communicating via atomics reduces latencies to ~1 microsecond.