- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Question about OpenCL Local work size, local size [256,1] doesn't work with global size [512,500]
Hello! I'm really confused, since formally there are no mistakes according OpenCL Khronos specs. But I got this error:
cl.enqueue_nd_range_kernel(queue, Hasher_kern,[512,500],[256,1], wait_for =None)
>>pyopencl._cl.LogicError: clEnqueueNDRangeKernel failed: INVALID_WORK_GROUP_SIZE
When I run with the local_work_size = None parameter, the kernel runs without error.
But any call to get_local_id (0) returns 0.
Also this code works correctly on Nvidia cards.
Please help!
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Guys, I apologize, everything works correct for you, the problem is Intel : ) For some reason, the integrated video card was the first priority. I'm very sorry to bother you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for reporting it. Please provide a minimal reproducible test-case and share clinfo output and other setup details like GPU, OS, driver version etc.
Thanks.
P.S. You have been whitelisted for Devgurus community.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for adding me to the whitelist.
Here is some sample code, I am using Python with PyOpencl.
import pyopencl as cl import numpy as np platform = cl.get_platforms()[0] device = platform.get_devices()[0] context = cl.Context([device]) queue = cl.CommandQueue(context) kernel = """ __kernel void test(__global uint *test){ uint img_width=512; uint id=get_global_id(0)+img_width*get_global_id(1); test[id]++; }; """ program = cl.Program(context, kernel).build() Test_kern = cl.Kernel(program, 'test') test_np=np.zeros(512*500,dtype=np.uint32) mf = cl.mem_flags test_buf = cl.Buffer(context, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=test_np) Test_kern.set_args(test_buf) test_event=cl.enqueue_nd_range_kernel(queue, Test_kern,[512,500],[256,1]) #test_event=cl.enqueue_nd_range_kernel(queue, Test_kern,[512,500],local_work_size=None) - works fine, expect get_local_id(..) in the kernel part, this always returns 0 in any dimension test_event.wait() print("Finish")
Here my GPU specs:
AMD Radeon Pro 5500M Compute Engine (AMD) Version: OpenCL 1.2 Type: ALL | GPU Memory (global): 8573157376 Memory (local): 65536 Address bits: 32 Max work item dims: 3 Max work group size: 256 Max compute units: 24 Driver version: 1.2 (Oct 29 2020 23:01:05) Image support: True Little endian: True Device available: True Compiler available: True Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_APPLE_command_queue_priority cl_APPLE_command_queue_select_compute_units cl_khr_fp64
I use a MacOs Catalina 10.15.7
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Guys, I apologize, everything works correct for you, the problem is Intel : ) For some reason, the integrated video card was the first priority. I'm very sorry to bother you.
