1 of 1 people found this helpful
simple rule is as many workitems as you can. for older card absolutly minimum of wotkitems is number of compute units*64. for new GCN architecture as 7xxx it is number of compute units*64*4
But the goal is to have the minimum number (This greatly boost the performance of my application) ! If possible, something that can be computed for both CPU and GPU !
I don't want to hard-code this number !!! But, I will already be happy if I can have something like ;
int unitsCount = clGetInfo(Units); // Does this information is available
int rule = 64 * 4;
return rule * unitsCount;
Maybe I have also to play with the kind of device (CPU or GPU) and maybe other informations can help ?
look at clGetDeviceInfo() and CL_DEVICE_MAX_COMPUTE_UNITS and
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE from clGetKernelWorkgroupInfo(). but it still lack that 4 multiple as GCN just lauch 4 workgroups on one CU at the same time (or 4 wavefronts from same workgroup if is workgroup size 256).