Discussion created by Veuvoch on May 10, 2010
Latest reply on May 10, 2010 by omkaranathan
I am working on an ATI RV770 which allows a maximum of 256 work-items in a workgroup. However, when I try to to set a workgroup size of 16x16x1 (=256) in the clEnqueueNDKernelRange function, I've got the error CL_INVALID_WORK_GROUP_SIZE. Is this the expected behaviour ?

I don't have the problem when I set the workgroup size to 8x8x1.

CLInfo : Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.0 ATI-Stream-v2.1 (145) Platform Name: ATI Stream Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd Platform Name: ATI Stream Number of devices: 2 [...] Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 10 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 [...] Platform ID: 0x7f9cadd5e228 Name: ATI RV770 Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.635 Profile: FULL_PROFILE Version: OpenCL 1.0 ATI-Stream-v2.1 (145) Extensions: cl_khr_icd cl_amd_fp64 cl_khr_gl_sharing cl_amd_device_attribute_query --------------------------------- Extracts of the code I am working on : [...] #define NSIZE 4096 [...] const size_t global_work_size[] = {NSIZE, NSIZE}; size_t local_work_size[2]; size_t lsize; [...] err = clGetDeviceInfo(device, CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &lsize, NULL); printf("%i\n", lsize); // shows 256 lsize = (size_t)sqrt(lsize); lsize = 1 << (size_t)log2(lsize); local_work_size[0] = lsize; local_work_size[1] = lsize; printf("%i\n", lsize); // shows 16 [...] clSetKernelArg(kernel, 0, sizeof(buffer_A), (void*) &buffer_A); clSetKernelArg(kernel, 1, sizeof(buffer_B), (void*) &buffer_B); clSetKernelArg(kernel, 2, sizeof(buffer_C), (void*) &buffer_C); clSetKernelArg(kernel, 3, lsize*lsize*sizeof(cl_float), NULL); clSetKernelArg(kernel, 4, lsize*lsize*sizeof(cl_float), NULL); // the following ends in CL_INVALID_WORK_GROUP_SIZE with local work size of 16x16x1 err = clEnqueueNDRangeKernel( queue, kernel, 2, NULL, global_work_size, local_work_size, 0, NULL, &event); [...]