I'm running an OpenCL program on a multi-core cpu using the AMD-APP SDK 2.7. The CPU has 48 cores (and these are all seen by the OpenCL runtime as verified with clinfo) but when I look at the system load in top I see that only 6 of the cores are being used. I'm launching about 20000 work items with a block size of 64 in a one-dimensional configuration.
Is there any way to directly control how many OS threads the OpenCL runtime uses?
If not, are there any guidelines for how to partition the work items in order to use a larger number of CPU cores?