      I usually use OpenCL on GPU. I want to do it on CPU, but I have a question:


      If the number of CUs on CPU is "n", what's the usual number of the global size and local size?


      I mean usually how many work items in one work group? And how many work groups are there? If the CPU contains 4 CU.



          As far as I understand the number of CU is not relevant to work group size. Work-groups are a "per-CU" property.


          My CPU has Max Workgroup size 1024.


          Do yourself a favor, download and install codeXL, you can pull out this info using menu Tools->System Information under "OpenCL devices" tab.

              I know its has a max number, but I read the following message from AMD OpenCL Programming Guide.


              I think its performance will become better if I start not that many threads as in GPU. But what is the usual number? For example, if CPU has 4 CU, what's the total usual thread number?



              CPU devices only support a small number of hardware threads, typically two

              to eight. Small numbers of active work-group sizes reduce the CPU switching

              overhead, although for larger kernels this is a second-order effect.

              Best performance is obtained when a workgroup size of 1 is used and a workgroup executes on a single logical core. SIMD instructions are used if you use the appropriate data types, i.e. float4, int4, double2 etc. (SSE2) or float8, double4 (AVX) or int8 etc. (AVX2).

