cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

elad
Adept I

OpenCL 2.0 compiler bug? (device side enqueue)

A similar issue is reported here.

I compile a kernel (kernel1) that performs device-side enqueue to another kernel (kernel2).

When kernel2 is empty, or contains little code, there is no problem. 

But when certain operations occur in kernel2 the clBuildProgram fails when an error:

error: <unknown>:0:0: in function __kernel2_block_invoke void (i8*): local memory global used by non-kernel function

In my case, the trigger is calling atomic_add on a local memory variable, but in the reported issue it is calling work_group_reduce_add.


Is this behavior expected? or is this a bug? is there a workaround?

Here is the full kernels code:

kernel void kernel2(global float* src)
{
   local int just_a_local;

   if (get_local_id(0))
   {
      just_a_local = 0;
   }
   barrier(CLK_LOCAL_MEM_FENCE);

   atomic_add(&just_a_local, 5); // commenting this line resolves the issue!
}

kernel void kernel1(global float* src)
{

   local int just_a_local;

   if (get_local_id(0))
   {
      just_a_local = 0;
   }

   barrier(CLK_LOCAL_MEM_FENCE);

   atomic_add(&just_a_local, 5);

   if (get_global_id(0) == 0)
   {
   enqueue_kernel(
   get_default_queue(),
   CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
   ndrange_1D(1),
   ^{
   kernel2(src);
   }

   );
   }

}

Hardware: AMD Radeon Pro W5700, Windows 10 latest driver.

0 Likes
1 Solution
dipak
Big Boss

As I've been informed, OpenCL team has added a workaround that will fix the above issue. It will convert the error to a warning.

Please note that, from the language perspective, a compiler can restrict the above LDS usage.  The spec says that a kernel function called from another kernel is a regular function, and regular functions are not able to declare their own local variables. 

Here is also an informative comment in the spec:

 "Kernel functions with variables declared inside the function with the __local or local qualifier can be called by the host using appropriate APIs such as clEnqueueNDRangeKernel.

This also implies that a kernel cannot call another kernel which declares local variables.

A block passed to enqueue_kernel can have arguments declared to be a pointer to local memory and below enqueue_kernel built-in function variant can be used for this purpose. For more information, please refer the section "Arguments that are a pointer type to local address space" in OpenCL C 2.0 spec

 

int enqueue_kernel(queue_t queue, kernel_enqueue_flags_t flags, const ndrange_t ndrange, void (^block)(local void *, …), uint size0, …)

Thanks.

View solution in original post

8 Replies
elad
Adept I

And just to add a few more details, I tried compiling the same code on a different PC with a different GPU (Radeon Pro WX7100) and the same code compiles just fine.

0 Likes
dipak
Big Boss

Thank you for reporting it. We will look into this and get back to you.

Thanks.

dipak
Big Boss

As I've been informed, OpenCL team has added a workaround that will fix the above issue. It will convert the error to a warning.

Please note that, from the language perspective, a compiler can restrict the above LDS usage.  The spec says that a kernel function called from another kernel is a regular function, and regular functions are not able to declare their own local variables. 

Here is also an informative comment in the spec:

 "Kernel functions with variables declared inside the function with the __local or local qualifier can be called by the host using appropriate APIs such as clEnqueueNDRangeKernel.

This also implies that a kernel cannot call another kernel which declares local variables.

A block passed to enqueue_kernel can have arguments declared to be a pointer to local memory and below enqueue_kernel built-in function variant can be used for this purpose. For more information, please refer the section "Arguments that are a pointer type to local address space" in OpenCL C 2.0 spec

 

int enqueue_kernel(queue_t queue, kernel_enqueue_flags_t flags, const ndrange_t ndrange, void (^block)(local void *, …), uint size0, …)

Thanks.

elad
Adept I

dipak‌,

Thank you so much for your assistance and thorough explanations! I am looking forward to this fix.

I am aware of the enqueue_kernel variant that can allocate LDS memory dynamically. But even if you use it, you still can't use function such as work_group_scan_exclusive_add and call your kernel recursively.

0 Likes

Please note, OpenCL C 2.0 spec says that "recursion is not supported" .

Thanks.

0 Likes
elad
Adept I

dipak‌ has this issue been fixed? I still get the same error on the latest driver.

The same error also appears when invoking workgroup functions...

0 Likes

Sorry to hear this. Let me check with the concerned team whether the fix is available in the public driver. I'll get back to you soon.

Thanks.

0 Likes

It looks like the latest public driver doesn't have the fix, hence you are still getting this error. The fix is expected to be released soon.

Thanks.

0 Likes