OpenCL

Steff3d · ‎12-14-2020

Hello! I'm really confused, since formally there are no mistakes according OpenCL Khronos specs. But I got this error:

cl.enqueue_nd_range_kernel(queue, Hasher_kern,[512,500],[256,1], wait_for =None)
>>pyopencl._cl.LogicError: clEnqueueNDRangeKernel failed: INVALID_WORK_GROUP_SIZE

When I run with the local_work_size = None parameter, the kernel runs without error.

But any call to get_local_id (0) returns 0.

Also this code works correctly on Nvidia cards.

Please help!

Steff3d · ‎12-14-2020

Guys, I apologize, everything works correct for you, the problem is Intel : ) For some reason, the integrated video card was the first priority. I'm very sorry to bother you.

View solution in original post

dipak · ‎12-14-2020

Thank you for reporting it. Please provide a minimal reproducible test-case and share clinfo output and other setup details like GPU, OS, driver version etc.

Thanks.

P.S. You have been whitelisted for Devgurus community.

Steff3d · ‎12-14-2020

Thank you for adding me to the whitelist.

Here is some sample code, I am using Python with PyOpencl.

import pyopencl as cl
import numpy as np

platform = cl.get_platforms()[0]
device = platform.get_devices()[0]
context = cl.Context([device])
queue = cl.CommandQueue(context)

kernel = """
__kernel void test(__global uint *test){
uint img_width=512;
uint id=get_global_id(0)+img_width*get_global_id(1);
test[id]++;
};
"""
program = cl.Program(context, kernel).build()
Test_kern = cl.Kernel(program, 'test')

test_np=np.zeros(512*500,dtype=np.uint32)
mf = cl.mem_flags
test_buf = cl.Buffer(context, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=test_np)

Test_kern.set_args(test_buf)
test_event=cl.enqueue_nd_range_kernel(queue, Test_kern,[512,500],[256,1])
#test_event=cl.enqueue_nd_range_kernel(queue, Test_kern,[512,500],local_work_size=None) - works fine, expect get_local_id(..) in the kernel part, this always returns 0 in any dimension
test_event.wait()
print("Finish")

Here my GPU specs:

AMD Radeon Pro 5500M Compute Engine (AMD)
   Version:                 OpenCL 1.2 
   Type:                    ALL | GPU
   Memory (global):         8573157376
   Memory (local):          65536     
   Address bits:            32        
   Max work item dims:      3         
   Max work group size:     256       
   Max compute units:       24        
   Driver version:          1.2 (Oct 29 2020 23:01:05)
   Image support:           True      
   Little endian:           True      
   Device available:        True      
   Compiler available:      True      
Extensions: 
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
cl_APPLE_clut
cl_APPLE_query_kernel_names
cl_APPLE_gl_sharing
cl_khr_gl_event
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store
cl_khr_depth_images
cl_APPLE_command_queue_priority
cl_APPLE_command_queue_select_compute_units
cl_khr_fp64

I use a MacOs Catalina 10.15.7

Steff3d · ‎12-14-2020

Guys, I apologize, everything works correct for you, the problem is Intel : ) For some reason, the integrated video card was the first priority. I'm very sorry to bother you.

OpenCL

Question about OpenCL Local work size, local size [256,1] doesn't work with global size [512,500]