A similar issue is reported here.
I compile a kernel (kernel1) that performs device-side enqueue to another kernel (kernel2).
When kernel2 is empty, or contains little code, there is no problem.
But when certain operations occur in kernel2 the clBuildProgram fails when an error:
error: <unknown>:0:0: in function __kernel2_block_invoke void (i8*): local memory global used by non-kernel function
In my case, the trigger is calling atomic_add on a local memory variable, but in the reported issue it is calling work_group_reduce_add.
Is this behavior expected? or is this a bug? is there a workaround?
Here is the full kernels code:
kernel void kernel2(global float* src)
{
local int just_a_local;if (get_local_id(0))
{
just_a_local = 0;
}
barrier(CLK_LOCAL_MEM_FENCE);atomic_add(&just_a_local, 5); // commenting this line resolves the issue!
}kernel void kernel1(global float* src)
{local int just_a_local;
if (get_local_id(0))
{
just_a_local = 0;
}
barrier(CLK_LOCAL_MEM_FENCE);
atomic_add(&just_a_local, 5);if (get_global_id(0) == 0)
{
enqueue_kernel(
get_default_queue(),
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange_1D(1),
^{
kernel2(src);
}
);
}}
Hardware: AMD Radeon Pro W5700, Windows 10 latest driver.
Solved! Go to Solution.
As I've been informed, OpenCL team has added a workaround that will fix the above issue. It will convert the error to a warning.
Please note that, from the language perspective, a compiler can restrict the above LDS usage. The spec says that a kernel function called from another kernel is a regular function, and regular functions are not able to declare their own local variables.
Here is also an informative comment in the spec:
"Kernel functions with variables declared inside the function with the __local or local qualifier can be called by the host using appropriate APIs such as clEnqueueNDRangeKernel."
This also implies that a kernel cannot call another kernel which declares local variables.
A block passed to enqueue_kernel can have arguments declared to be a pointer to local memory and below enqueue_kernel built-in function variant can be used for this purpose. For more information, please refer the section "Arguments that are a pointer type to local address space" in OpenCL C 2.0 spec
int enqueue_kernel(queue_t queue, kernel_enqueue_flags_t flags, const ndrange_t ndrange, void (^block)(local void *, …), uint size0, …)
Thanks.
And just to add a few more details, I tried compiling the same code on a different PC with a different GPU (Radeon Pro WX7100) and the same code compiles just fine.
Thank you for reporting it. We will look into this and get back to you.
Thanks.
As I've been informed, OpenCL team has added a workaround that will fix the above issue. It will convert the error to a warning.
Please note that, from the language perspective, a compiler can restrict the above LDS usage. The spec says that a kernel function called from another kernel is a regular function, and regular functions are not able to declare their own local variables.
Here is also an informative comment in the spec:
"Kernel functions with variables declared inside the function with the __local or local qualifier can be called by the host using appropriate APIs such as clEnqueueNDRangeKernel."
This also implies that a kernel cannot call another kernel which declares local variables.
A block passed to enqueue_kernel can have arguments declared to be a pointer to local memory and below enqueue_kernel built-in function variant can be used for this purpose. For more information, please refer the section "Arguments that are a pointer type to local address space" in OpenCL C 2.0 spec
int enqueue_kernel(queue_t queue, kernel_enqueue_flags_t flags, const ndrange_t ndrange, void (^block)(local void *, …), uint size0, …)
Thanks.
dipak,
Thank you so much for your assistance and thorough explanations! I am looking forward to this fix.
I am aware of the enqueue_kernel variant that can allocate LDS memory dynamically. But even if you use it, you still can't use function such as work_group_scan_exclusive_add and call your kernel recursively.
Please note, OpenCL C 2.0 spec says that "recursion is not supported" .
Thanks.
dipak has this issue been fixed? I still get the same error on the latest driver.
The same error also appears when invoking workgroup functions...
Sorry to hear this. Let me check with the concerned team whether the fix is available in the public driver. I'll get back to you soon.
Thanks.
It looks like the latest public driver doesn't have the fix, hence you are still getting this error. The fix is expected to be released soon.
Thanks.