I am working in a one-dimensional domain and I have a fixed number of work-items (global work size), suppose 1000 work-items. I execute a single call to enqueueNDRangeKernel as follows:
queue.enqueueNDRangeKernel( kernel, cl::NDRange(), cl::NDRange(1000), cl::NDRange(1) );
Since I have a six-core AMD CPU (so CL_DEVICE_MAX_COMPUTE_UNITS = 6), OpenCL executes simultaneously six kernel instances (work-items). However, I would like to tell OpenCL to use only a single CPU core, is this possible?
I know that I could achieve the serial execution by setting global work size to 1, changing the code properly, and calling NDRangeKernel multiple times--unfortunately, given how the code is structured, "changing the code properly" would be a daunting task.