I am running my OpenCL program on a cluster. The cluster software prefers jobs to use a single CPU core so it can utilize the nodes effectively. I tried using device fission to restrict my code to run on a single core, but I was dismayed to discover that when multiple instances of my program were run they used the same single CPU core, which is obviously sub-optimal. Is there any way to restrict an OpenCL process to a single core but still allow the process to automatically migrate to other cores (I assume this is handled by the kernel). Here is my current code (approximately):
cl_device_partition_property[3] props = [CL_DEVICE_PARTITION_EQUALLY, compute_units, 0];
cl_uint num_sub_devices;
err = clCreateSubDevices(Device, props.ptr, 0, null, &num_sub_devices);
assert(err == 0, "Failed to create sub-devices:" ~ GetCLErrorString(err));
if(sub_device_idx >= num_sub_devices)
throw new Exception("Invalid sub-device index " ~ Format("{}", sub_device_idx).idup ~ ".");
scope sub_devices = new cl_device_id[](num_sub_devices);
err = clCreateSubDevices(Device, props.ptr, num_sub_devices, sub_devices.ptr, null);
assert(err == 0, "Failed to create sub-devices:" ~ GetCLErrorString(err));
Device = sub_devices[sub_device_idx];
Note that I always use the same sub_device_idx (which is obviously where this issue comes from). It is not possible for me to vary this, because I do not control which nodes the cluster software juggles places my programs on.
Solved! Go to Solution.
So it occurred to me that the only real way to pin a thread to a CPU core would be via CPU affinity, which can be changed externally to the program. Normally you do this to pin a thread to a core, but in my case I obviously want the reverse. Here's my current run script that does this:
#!/bin/sh
./my_opencl_program & PID=$!
sleep 10
echo "Unpinning threads"
for i in `ps -Lo tid --no-headers $PID`
do
taskset -p ffffffff $i
done
wait
It launches my code, grabs the PID of that process... then after a few seconds (necessary because it doesn't launch all the threads right away) it uses taskset and ps to unpin every single thread under that process. So far this seems to work beautifully.
Just a shot in the dark, does the (debug) environmental variable CPU_MAX_COMPUTE_UNITS still work?
No, unfortunately that doesn't work. It does limit the program to one of the cores, but like with device fission... it is the same one core for every process. If I launch several programs with CPU_MAX_COMPUTE_UNITS=1 they all gang up on the first core.
So it occurred to me that the only real way to pin a thread to a CPU core would be via CPU affinity, which can be changed externally to the program. Normally you do this to pin a thread to a core, but in my case I obviously want the reverse. Here's my current run script that does this:
#!/bin/sh
./my_opencl_program & PID=$!
sleep 10
echo "Unpinning threads"
for i in `ps -Lo tid --no-headers $PID`
do
taskset -p ffffffff $i
done
wait
It launches my code, grabs the PID of that process... then after a few seconds (necessary because it doesn't launch all the threads right away) it uses taskset and ps to unpin every single thread under that process. So far this seems to work beautifully.