0 Replies Latest reply on Jan 26, 2016 5:11 AM by h_l

    HSA: Kernel dispatch latencies (What are your experiences?)

    h_l

      Hi,

       

      I'm currently working on reducing the dispatch times of HSA kernels to enable fine-grained offloading.

      At the moment I encounter latencies of  about 1 micro-second (doing some tricks, see below for details).

       

      I would be very interested in the experiences of other HSA developers.

      - What dispatch latencies do you encounter?

      - Did you find some tricks, hacks or optimizations to reduce latencies?

       

      Many thanks in advance.

       

      ---

      Setup: In my experiments I use a simple "do nothing" kernel. I disable the interrupt handling (env HSA_ENABLE_INTERRUPT=0 <hsa_app>) and use busy-waiting instead. Further, the iGPUs' frequency is pinned to 720MHz.

       

      A synchronous dispatch of a single kernel takes ~7 micro-seconds (time until the application receives the completion signal, including AQL-enqueue).

      Dispatching multiple kernels in batches can hide latencies to some degree: 3.5 microseconds.

      Dispatching and running a (busy-wait-) kernel in advance and communicating via atomics reduces latencies to ~1 microsecond.