cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

mejlango
Journeyman III

async_work_group_copy error: CL_INVALID_WORK_GROUP_SIZE

Hi,

I'm trying to speed up openCL kernel by using __local memory. Until I used  async_work_group_copy, I was able to set global_work_size to 128 and local_work size to 64. When I used async_work_group_copy call, I couldn't set local_work_size to 64 and error code from clEnqueueNDRangeKernel returned was CL_INVALID_WORK_GROUP_SIZE. The following code demonstrate this.

My device ATI HD 4500:

CL_DEVICE_MAX_WORK_GROUP_SIZE: 128

CL_DEVICE_MAX_WORK_ITEM_SIZES: 128 128 128

I couldn't found in documentation why should this error occured. Am I missing something, or doing something wrong?

 

thanks

size_t* globalThreads = new size_t[1]; globalThreads[0] = 128; size_t* localThreads = new size_t[1]; localThreads[0] = 64; //64 -> error, but 32 ->ok cl_int status = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL, globalThreads, localThreads, 0, NULL, &eventSyncKernel);

0 Likes
1 Reply
nou
Exemplar

query local size with clGetKernelWorkGroupInfo CL_KERNEL_WORK_GROUP_SIZE

on 4xxx cards IIRC if you use barrier you can't execute work group larger than 64. and IMHO async_copy must execute barrier to ensure that data are loaded correctly.

0 Likes