OpenCL performance drop after subsequent calls

Question asked by derive on Aug 25, 2015

I have encountered a possible bug with OpenCL drivers with the most recent 15.7.1 Catalyst drivers on a Radeon HD 7950.

I'm writing an image reconstruction program consisting of relatively simple A=B+C like kernels that are executed iteratively.

For a particular parameter settings, the 14.6 drivers resulted in ~0.89 s/cycle speed.

With the latest drivers, it the first few iterations take 0.66 s/cycle (nice boost), however the following iterations will all take 1.15 s/cycle. This behaviour appears only when working with  double2 arrays beyond 768x768 points and when the same code is ran on the CPU it provides a constant cycle time. The environment is not changed between different cycles, no memory objects are allocated, the card is not overclocked and is at 60C, the 650W power supply is more than adequate for the 250W card.

My question is what can cause the slowdown if the first iterations ran with the correct speed, but later the speed generally drops even for the most simple kernels?