I am performing serial single-core CPU benchmarks (as point of reference), by setting the environment variable CPU_MAX_COMPUTE_UNITS to 1. Unfortunately, each execution takes too long (up to 140 times slower than the GPU version!), thus I would like to be able to run simultaneously multiple instances of the serial benchmark on my six-core processor in order to finish the set of benchmarks more quickly.
However, when I try to do this, each independent instance gets assigned to the same CPU core (the first one, that is, CPU0) instead of each one getting assigned to a different core. How do I do to execute multiple instances of the serial execution in separate CPU cores?
I'm using Debian GNU/Linux 64-bit and the AMD SDK v2.2.