I usually use OpenCL on GPU. I want to do it on CPU, but I have a question:
If the number of CUs on CPU is "n", what's the usual number of the global size and local size?
I mean usually how many work items in one work group? And how many work groups are there? If the CPU contains 4 CU.
As far as I understand the number of CU is not relevant to work group size. Work-groups are a "per-CU" property.
My CPU has Max Workgroup size 1024.
Do yourself a favor, download and install codeXL, you can pull out this info using menu Tools->System Information under "OpenCL devices" tab.
I know its has a max number, but I read the following message from AMD OpenCL Programming Guide.
I think its performance will become better if I start not that many threads as in GPU. But what is the usual number? For example, if CPU has 4 CU, what's the total usual thread number?
CPU devices only support a small number of hardware threads, typically two
to eight. Small numbers of active work-group sizes reduce the CPU switching
overhead, although for larger kernels this is a second-order effect.
Best performance is obtained when a workgroup size of 1 is used and a workgroup executes on a single logical core. SIMD instructions are used if you use the appropriate data types, i.e. float4, int4, double2 etc. (SSE2) or float8, double4 (AVX) or int8 etc. (AVX2).