I just tried running my kernel with different local sizes and when I tried 8 by 4 by 4 (256), I received a CL_INVALID_WORK_GROUP_SIZE error by clEnqueueNDRangeKernel . I reduced to 8,2,4 and ran again and then VPU recovery was triggered within 4 seconds. I ran the exact same kernel/setup again and it completed without problems after 6 seconds.
Global work size was 48 by 48 by 48.
What GPU you are using?
You should use clGetKernelWorkGroupInfo with flag CL_KERNEL_WORK_GROUP_SIZE to get the maximum group-size that a particular kernel can handle.
I am using the ATI 4870 and it reports to have a work group size of 256 (checked with the sample program CLInfo). So my initial attempt shouldn't have errored in the first place.
Theres a difference between maximum work-group size a device can support(returned by CLInfo) and max-work group size the device supports to execute a particular kernel (clGetKernelWorkGroupInfo).
In many cases when barriers are used the 7xx series will support work-group size no more than 64. You should query that information.
Thank you for that information. I didn't realise that at all. I will put in a check now
But should VPU recovery be triggered if max work group size has been crossed, even if it wasn't allowed?