cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

elad
Adept I
Adept I

OpenCL 2.0 compiler bug? (device side enqueue)

Jump to solution

A similar issue is reported here.

I compile a kernel (kernel1) that performs device-side enqueue to another kernel (kernel2).

When kernel2 is empty, or contains little code, there is no problem. 

But when certain operations occur in kernel2 the clBuildProgram fails when an error:

error: <unknown>:0:0: in function __kernel2_block_invoke void (i8*): local memory global used by non-kernel function

In my case, the trigger is calling atomic_add on a local memory variable, but in the reported issue it is calling work_group_reduce_add.


Is this behavior expected? or is this a bug? is there a workaround?

Here is the full kernels code:

kernel void kernel2(global float* src)
{
   local int just_a_local;

   if (get_local_id(0))
   {
      just_a_local = 0;
   }
   barrier(CLK_LOCAL_MEM_FENCE);

   atomic_add(&just_a_local, 5); // commenting this line resolves the issue!
}

kernel void kernel1(global float* src)
{

   local int just_a_local;

   if (get_local_id(0))
   {
      just_a_local = 0;
   }

   barrier(CLK_LOCAL_MEM_FENCE);

   atomic_add(&just_a_local, 5);

   if (get_global_id(0) == 0)
   {
   enqueue_kernel(
   get_default_queue(),
   CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
   ndrange_1D(1),
   ^{
   kernel2(src);
   }

   );
   }

}

Hardware: AMD Radeon Pro W5700, Windows 10 latest driver.

0 Kudos
Reply
1 Solution

Accepted Solutions
dipak
Staff
Staff

Re: OpenCL 2.0 compiler bug? (device side enqueue)

Jump to solution

As I've been informed, OpenCL team has added a workaround that will fix the above issue. It will convert the error to a warning.

Please note that, from the language perspective, a compiler can restrict the above LDS usage.  The spec says that a kernel function called from another kernel is a regular function, and regular functions are not able to declare their own local variables. 

Here is also an informative comment in the spec:

 "Kernel functions with variables declared inside the function with the __local or local qualifier can be called by the host using appropriate APIs such as clEnqueueNDRangeKernel.

This also implies that a kernel cannot call another kernel which declares local variables.

A block passed to enqueue_kernel can have arguments declared to be a pointer to local memory and below enqueue_kernel built-in function variant can be used for this purpose. For more information, please refer the section "Arguments that are a pointer type to local address space" in OpenCL C 2.0 spec

 

int enqueue_kernel(queue_t queue, kernel_enqueue_flags_t flags, const ndrange_t ndrange, void (^block)(local void *, …), uint size0, …)

Thanks.

View solution in original post

8 Replies
elad
Adept I
Adept I

Re: OpenCL 2.0 compiler bug? (device side enqueue)

Jump to solution

And just to add a few more details, I tried compiling the same code on a different PC with a different GPU (Radeon Pro WX7100) and the same code compiles just fine.

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL 2.0 compiler bug? (device side enqueue)

Jump to solution

Thank you for reporting it. We will look into this and get back to you.

Thanks.

dipak
Staff
Staff

Re: OpenCL 2.0 compiler bug? (device side enqueue)

Jump to solution

As I've been informed, OpenCL team has added a workaround that will fix the above issue. It will convert the error to a warning.

Please note that, from the language perspective, a compiler can restrict the above LDS usage.  The spec says that a kernel function called from another kernel is a regular function, and regular functions are not able to declare their own local variables. 

Here is also an informative comment in the spec:

 "Kernel functions with variables declared inside the function with the __local or local qualifier can be called by the host using appropriate APIs such as clEnqueueNDRangeKernel.

This also implies that a kernel cannot call another kernel which declares local variables.

A block passed to enqueue_kernel can have arguments declared to be a pointer to local memory and below enqueue_kernel built-in function variant can be used for this purpose. For more information, please refer the section "Arguments that are a pointer type to local address space" in OpenCL C 2.0 spec

 

int enqueue_kernel(queue_t queue, kernel_enqueue_flags_t flags, const ndrange_t ndrange, void (^block)(local void *, …), uint size0, …)

Thanks.

View solution in original post

elad
Adept I
Adept I

Re: OpenCL 2.0 compiler bug? (device side enqueue)

Jump to solution

dipak‌,

Thank you so much for your assistance and thorough explanations! I am looking forward to this fix.

I am aware of the enqueue_kernel variant that can allocate LDS memory dynamically. But even if you use it, you still can't use function such as work_group_scan_exclusive_add and call your kernel recursively.

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL 2.0 compiler bug? (device side enqueue)

Jump to solution

Please note, OpenCL C 2.0 spec says that "recursion is not supported" .

Thanks.

0 Kudos
Reply
elad
Adept I
Adept I

Re: OpenCL 2.0 compiler bug? (device side enqueue)

Jump to solution

dipak‌ has this issue been fixed? I still get the same error on the latest driver.

The same error also appears when invoking workgroup functions...

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL 2.0 compiler bug? (device side enqueue)

Jump to solution

Sorry to hear this. Let me check with the concerned team whether the fix is available in the public driver. I'll get back to you soon.

Thanks.

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL 2.0 compiler bug? (device side enqueue)

Jump to solution

It looks like the latest public driver doesn't have the fix, hence you are still getting this error. The fix is expected to be released soon.

Thanks.

0 Kudos
Reply