My app prints time it spent in different kernels and GPU I/O operations. Somethins these time >3 times bigger for one run than for another. It reflected in total runtime too of course. All times differes, mean time and min time too. Host reboot usually restores smaller execution times, but not always. Now I see smaller execution times restored "by themselves" even w/o host reboot.
These changes look not connected with GPU load itself. Sometimes I see small execution kernel times on full loaded GPU running other GPU/CPU-intensive apps, but sometimes app shows large execution times on completely idle host.
What reason of such dispersion could be ?
For example, one of kernel execution times varies from arount 1e6 ticks to 3,5e6 ticks (and usually big gap between these values, that is, either all kernels have small execution time ~same value each time, either they all have big execution time, again, ~same value between runs).
Memory alignment issue? How do you think ?
P.S. and another observation: When there are big runtimes, disabling explicit execution domain setting speedups kernel a lot (~2 times) while with small times I see no difference between version with or w/o explicit execution domain control...