Hi,
I am working in a one-dimensional domain and I have a fixed number of work-items (global work size), suppose 1000 work-items. I execute a single call to enqueueNDRangeKernel as follows:
queue.enqueueNDRangeKernel( kernel, cl::NDRange(), cl::NDRange(1000), cl::NDRange(1) );
Since I have a six-core AMD CPU (so CL_DEVICE_MAX_COMPUTE_UNITS = 6), OpenCL executes simultaneously six kernel instances (work-items). However, I would like to tell OpenCL to use only a single CPU core, is this possible?
I know that I could achieve the serial execution by setting global work size to 1, changing the code properly, and calling NDRangeKernel multiple times--unfortunately, given how the code is structured, "changing the code properly" would be a daunting task.
set enviroment variable
export CPU_MAX_COMPUTE_UNITS=1
Thank you guys.
Originally posted by: nou set enviroment variable
export CPU_MAX_COMPUTE_UNITS=1
Is this specific to the AMD's OpenCL implementation? Or, is the above environment variable part of the OpenCL specification?
Alternatively, if you need more control, you can look at the device fission extension.
http://www.khronos.org/registry/cl/extensions/ext/cl_ext_device_fission.txt
Thank you.