Was wondering if AMD devs have any insights into this observation:
I've observed that running my benchmarks on GPU1 runs significantly faster than GPU0. I've noticed GPU0 can be 2 and sometimes 4 times longer than GPU1 in computation time. I assumed this was because GPU0 is busy with Xorg on the OpenSUSE 13.2 64bit installation I'm running on (3.0 APP SDK on catalyst 14.12). I eventually kill Xorg to test without anything else running on the GPU (to the extent I can control) but it is still keeping these timings.
My computations on GPU1 for an certain image processing kernel (secret sauce) are a stable 2 ms per call, on GPU0 it is 5ms but eventually sticks up to 10ms - with no code/build changes! On another kernel (modified radix sort) this behaviour is again duplicated - 14ms for GPU1, 28ms for GPU0 sometimes and almost 70ms others. Input data does not vary either in these tests and these should be nearly deterministic in execution time as the algorithms are data independent and have no element of randomness.
This is a big problem if it is the loss from the display driving GPU for single GPU embedded systems I'm building but its troublesome in the now because I have alot of real-time work I need to do on these GPUS and am counting on full utilization with my programs.
Any idea what's going on?