I have two questions:
HSAIL-HLC-Stable generates
kernarg_u64 %__vqueue_pointer, | |
kernarg_u64 %__aqlwrap_pointer, |
which I guess are used to pass queue information to the kernel. Is there a documentation/sample for them ?
Solved! Go to Solution.
Sure. In our test case we pass both the queue pointer and a completion signal to the kernel. We pass the completion signal because kernels don't currently have the ability to create signals. The host side structure used to 'pack' the kernel arguments looks like this:
.
struct dispatch_parms {
hsa_queue_t* queue;
has_signal_t signal;
}
.
A kernel using the large profile would have the following signature:
prog kernel &__agent_dispatch_kernel(kernarg_u64 %queue, kernarg_u64 &signal) {
@__agent_dispatch_kernel_entry:
// Load the queue pointer
ld_kernarg_align(8)_width(all)_u64 $d0, [%queue];
// Load the signal handle
ld_global_sig64 $d1, [%signal];
// Increment the queue's write index by the amount specified in
// the $d2 register. Store the original write index value back into
// the $d2 register.
addqueuewriteindex_global_scar_u64 $d2, $d0, $d2;
.
. <Do other things to the queue: See section 11.3 of the HSAIL programming guide>
.
// Wait until the signal's ($d1) value is equal to the value in $d2
// Store the returned value back into $d2
signal_wait_eq_rlx_s64_sig64 $d2, $d1, $d2;
.
.<Do other things to the signal: See section 6.8 of the HSAIL programming guide>
.
ret;
}
The target for the updated compiler is the end of May, but it could occur sooner.
Look at this sample to see how to en-queue a kernel in the HSA runtime: CLOC/examples/hsa/vector_copy at master · HSAFoundation/CLOC · GitHub
Hi,
Thanks for the ETA.
As for the sample unfortunately it has two problems. First - it still uses provisional API.
Second - it does not demonstrate enqueue_kernel call inside the kernel and does not explain how vqueue_pointer and aqlwrap_pointer in the kernel parameters should be filled out as it simple zeroes them out:
#ifdef DUMMY_ARGS | |
//This flags should be set if HSA_HLC_Stable is used | |
// This is because the high level compiler generates 6 extra args | |
kernel_arg_start_offset += sizeof(uint64_t) * 6; | |
printf("Using dummy args \n"); | |
#endif |
I understand now. You wanted a sample that shows how to actually performs agent dispatch. We currently do not have an updated sample that shows how to do that, although we have tested it. The test is not available to publish until it goes through our legal process.
Hmm, but at least is it possible to post what is supposed to go into kernarg_u64 %__vqueue_pointer and kernarg_u64 %__aqlwrap_pointer?
Sure. In our test case we pass both the queue pointer and a completion signal to the kernel. We pass the completion signal because kernels don't currently have the ability to create signals. The host side structure used to 'pack' the kernel arguments looks like this:
.
struct dispatch_parms {
hsa_queue_t* queue;
has_signal_t signal;
}
.
A kernel using the large profile would have the following signature:
prog kernel &__agent_dispatch_kernel(kernarg_u64 %queue, kernarg_u64 &signal) {
@__agent_dispatch_kernel_entry:
// Load the queue pointer
ld_kernarg_align(8)_width(all)_u64 $d0, [%queue];
// Load the signal handle
ld_global_sig64 $d1, [%signal];
// Increment the queue's write index by the amount specified in
// the $d2 register. Store the original write index value back into
// the $d2 register.
addqueuewriteindex_global_scar_u64 $d2, $d0, $d2;
.
. <Do other things to the queue: See section 11.3 of the HSAIL programming guide>
.
// Wait until the signal's ($d1) value is equal to the value in $d2
// Store the returned value back into $d2
signal_wait_eq_rlx_s64_sig64 $d2, $d1, $d2;
.
.<Do other things to the signal: See section 6.8 of the HSAIL programming guide>
.
ret;
}