We know how the concepts in OpenCL maps to GPU hardwares, but how about CPU?
In the case of A10-7850K, the "max work group size" is 1024, does this number has any hardware meaning?
When one group stalls, will another group switch in? If yes, how is it implemented?
Thanks in advance.
for the cpu portion, because of the relative higher costs of synchronization and other fundamental differences, larger work groups make sense but even then the 1024 is probably fairly arbitrary and not depending on the hardware much if any. why would one work group stall and need to be switched? anyway the cpu portion implemented as threads on the os running under the same process - you can verify this with system process/thread monitors.