cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

dmeiser
Elite

control over number of cpu cores used

Hi,

I'm running an OpenCL program on a multi-core cpu using the AMD-APP SDK 2.7. The CPU has 48 cores (and these are all seen by the OpenCL runtime as verified with clinfo) but when I look at the system load in top I see that only 6 of the cores are being used. I'm launching about 20000 work items with a block size of 64 in a one-dimensional configuration.

Is there any way to directly control how many OS threads the OpenCL runtime uses?

If not, are there any guidelines for how to partition the work items in order to use a larger number of CPU cores?

Thanks,

Dominic

0 Likes
7 Replies
Wenju
Elite

Hi Dominic,

Maybe your 48 cores CPU is consist of 8 devices, each device has 6 cores, just speculating. So if you want to use a larger number of cpu, you should divide your data, and create more context/commandqueue, and then calculate them. I'm not sure about this, but it should work in theory.

0 Likes

Hi Wenju,

Thanks for your response. The 48 cores are exposed as one smp to the os. OpenCL recognizes it as such: one device with 48 compute units. With different OpenCL programs I have been able to utilize all 48 cores.

There must be some heuristic built into the AMD OpenCL runtime to decide how many cores to use for a given workload. I would have thought that the above worksize (more than 20000 work items) would have been sufficient to trigger utilization of all cores. It could be because the amount of work done per work item is pretty light weight.  That's why I'm wondering if there is a way to explicitly override the internal heuristics.

Dominic

0 Likes

Did you try setting the environment variable CPU_MAX_COMPUTE_UNITS ?

export CPU_MAX_COMPUTE_UNITS=48

PS. Actually I am not sure how this is used, it is rather undocumented

I did not. Will give it a try. Thanks for the tip.

0 Likes

Setting CPU_MAX_COMPUTE_UNITS to 48 didn't increase the CPU utilization. But when I set it to 1 I see that only one CPU core is used. So perhaps this environment variable provides just an upper bound.

0 Likes

if the workitems are lightweight you maybe experince runtime overhead so it can't utilize all 48 cores. check some process explorer to see how many threads are launched. for example htop can show threads.

Thanks for the tip. I haven't been able to verify how many threads to opencl runtime is launching.

Another possible explanation could be that the program spends more time in the sequential sections between the opencl kernel calls than I think it should.

I'll try to see if the sprofiler works with cpu kernels. That should report exactly how many threads are launched.

0 Likes