OpenCL on multicore CPU

Discussion created by erman_amd on Apr 28, 2011
Latest reply on May 9, 2011 by LeeHowes
How actually work-items processed by multicore CPU


I tried to understand the different of work item execution on CPU. On GPU, I use 

assuming use 1D ND range

globalWorkSize[0] = (any large number)

localWorkSize[0] = 256; 

This means the work group size is 256. I read somewhere in this forum, for CPU execution, it is best to use work-group size of 1. So I use

globalWorkSize[0] = (any large number)

localWorkSize[0] = 1; 

Is this made the work-group size to 1? Anyone can help explain why the work-group size should be 1 (I guess it is related to task parallelism). How does the CPU actually process the work items?  TheOpenCL guide is focusing on GPU (wavefront, etc.). I can understand the approach because we want to use the data-parallel on GPU. I just want to know in more detail how it is implemented on CPU, so I can compare it.

Thank you.