Hello! I'm really confused, since formally there are no mistakes according OpenCL Khronos specs. But I got this error:
cl.enqueue_nd_range_kernel(queue, Hasher_kern,[512,500],[256,1], wait_for =None)
>>pyopencl._cl.LogicError: clEnqueueNDRangeKernel failed: INVALID_WORK_GROUP_SIZE
When I run with the local_work_size = None parameter, the kernel runs without error.
But any call to get_local_id (0) returns 0.
Also this code works correctly on Nvidia cards.
Please help!
Solved! Go to Solution.
Guys, I apologize, everything works correct for you, the problem is Intel : ) For some reason, the integrated video card was the first priority. I'm very sorry to bother you.
Thank you for reporting it. Please provide a minimal reproducible test-case and share clinfo output and other setup details like GPU, OS, driver version etc.
Thanks.
P.S. You have been whitelisted for Devgurus community.
Thank you for adding me to the whitelist.
Here is some sample code, I am using Python with PyOpencl.
import pyopencl as cl import numpy as np platform = cl.get_platforms()[0] device = platform.get_devices()[0] context = cl.Context([device]) queue = cl.CommandQueue(context) kernel = """ __kernel void test(__global uint *test){ uint img_width=512; uint id=get_global_id(0)+img_width*get_global_id(1); test[id]++; }; """ program = cl.Program(context, kernel).build() Test_kern = cl.Kernel(program, 'test') test_np=np.zeros(512*500,dtype=np.uint32) mf = cl.mem_flags test_buf = cl.Buffer(context, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=test_np) Test_kern.set_args(test_buf) test_event=cl.enqueue_nd_range_kernel(queue, Test_kern,[512,500],[256,1]) #test_event=cl.enqueue_nd_range_kernel(queue, Test_kern,[512,500],local_work_size=None) - works fine, expect get_local_id(..) in the kernel part, this always returns 0 in any dimension test_event.wait() print("Finish")
Here my GPU specs:
AMD Radeon Pro 5500M Compute Engine (AMD) Version: OpenCL 1.2 Type: ALL | GPU Memory (global): 8573157376 Memory (local): 65536 Address bits: 32 Max work item dims: 3 Max work group size: 256 Max compute units: 24 Driver version: 1.2 (Oct 29 2020 23:01:05) Image support: True Little endian: True Device available: True Compiler available: True Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_APPLE_command_queue_priority cl_APPLE_command_queue_select_compute_units cl_khr_fp64
I use a MacOs Catalina 10.15.7
Guys, I apologize, everything works correct for you, the problem is Intel : ) For some reason, the integrated video card was the first priority. I'm very sorry to bother you.