I've started developing OpenCL recently and am running fairly basic test programs try things out. One of my tests is running a simple kernel with a small work-size (64 elements) but a large loop inside to get a large execution time and then use CodeXL to understand the time line. Using CodeXL I noticed somewhat erratic results when running a number of these in parallel. So I decided to remove one random element from my setup and not use the AMD GPU for my display, but instead hook my display to the Intel GPU that comes with my motherboard. My AMD GPU (a 460) is now headless and - as far as I know - is idle unless I'm sending it some OpenCL things to do. So far so good and the test still runs, but contrary to my expectations the execution time for the kernel is now longer. Initially it was 800 milliseconds on average, now in headless configuration it has gone up to 4000 milliseconds.
I'm running Windows 10, 64 bit. The GPU is a 460 with 4 GByte. The drivers are up to date.
Can someone explain this performance drop?