5 Replies Latest reply on Apr 27, 2015 2:42 PM by jedwards

    HSA-HLC stable and enqueue_kernel

    vladimir_1

      I have two questions:

           HSAIL-HLC-Stable generates

      kernarg_u64 %__vqueue_pointer,
      kernarg_u64 %__aqlwrap_pointer,

      which I guess are used to pass queue information to the kernel. Is there a documentation/sample for them ?

        • Re: HSA-HLC stable and enqueue_kernel
          jedwards

          The target for the updated compiler is the end of May, but it could occur sooner.


          Look at this sample to see how to en-queue a kernel in the HSA runtime: CLOC/examples/hsa/vector_copy at master · HSAFoundation/CLOC · GitHub

          1 of 1 people found this helpful
            • Re: HSA-HLC stable and enqueue_kernel
              vladimir_1

              Hi,

              Thanks for the ETA.

               

              As for the sample unfortunately it has two problems. First - it still uses provisional API.

              Second - it does not demonstrate enqueue_kernel call inside the kernel and does not explain how vqueue_pointer and aqlwrap_pointer in the kernel parameters should be filled out as it simple zeroes them out:

              #ifdef DUMMY_ARGS
                 //This flags should be set if HSA_HLC_Stable is used
                 // This is because the high level compiler generates 6 extra args
                kernel_arg_start_offset += sizeof(uint64_t) * 6;
                 printf("Using dummy args \n");
              #endif
            • Re: HSA-HLC stable and enqueue_kernel
              vladimir_1

              Hmm, but at least is it possible to post what is supposed to go into kernarg_u64 %__vqueue_pointer and kernarg_u64 %__aqlwrap_pointer?

                • Re: HSA-HLC stable and enqueue_kernel
                  jedwards

                  Sure. In our test case we pass both the queue pointer and a completion signal to the kernel. We pass the completion signal because kernels don't currently have the ability to create signals. The host side structure used to 'pack' the kernel arguments looks like this:

                  .

                  struct dispatch_parms {

                      hsa_queue_t* queue;

                      has_signal_t   signal;

                  }

                  .

                  A kernel using the large profile would have the following signature:

                   

                  prog kernel &__agent_dispatch_kernel(kernarg_u64 %queue, kernarg_u64 &signal) {

                  @__agent_dispatch_kernel_entry:

                   

                      // Load the queue pointer

                      ld_kernarg_align(8)_width(all)_u64  $d0, [%queue];

                   

                      // Load the signal handle

                      ld_global_sig64  $d1, [%signal];

                   

                      // Increment the queue's write index by the amount specified in

                      // the $d2 register. Store the original write index value back into

                      // the $d2 register.

                      addqueuewriteindex_global_scar_u64 $d2, $d0, $d2;

                      .

                      . <Do other things to the queue: See section 11.3 of the HSAIL programming guide>

                      .

                      // Wait until the signal's ($d1) value is equal to the value in $d2

                      // Store the returned value back into $d2

                      signal_wait_eq_rlx_s64_sig64 $d2, $d1, $d2;

                      .

                      .<Do other things to the signal: See section 6.8 of the HSAIL programming guide>

                      .

                      ret;

                  }