While testing my workload on a different system, to which I had to move an AMD R9 290 card, I encountered several bottlenecks that slowed my workload down by 25% to over 400%. Configuration changes alleviated some of it, but I'm thinking I may need to try to do something programmatically to make my performance more consistent. So looking for suggestions.
Here are some things I did.
1) deleted a bunch of disk files -- my workload isn't always doing disk I/O, but in this case it was set to do so. It needs to read/write several GB of files. Normally a large amount of RAM hides the disk I/O pretty well, especially in a repeated run. Anyway, I cleared up a bunch of disk space as I only had several GB left, and it dropped my runtimes down a lot.
2) I was still at least 200% slower, so I started killing other programs. Nothing helped... finally, I stopped SearchIndexer.exe (disabled Windows Search service) and that got me down to maybe 50% slower than normal. The indexer wasn't using an obviously high number of cycles, but it wasn't quiescent.
3) Disabled processor management; So from control panel, find power management; Edit Power Plan; Advanced Settings; Processor Power Management. I set minimum, maximum to 100% each, and System Cooling Policy to passive. That got me to where I needed to be, matching the performance of the other system.
4) The next day, after restarting Outlook, the SearchIndexer came back on and slowed it down, although I'd disabled the service. Stopping it again worked. It is interesting though, the machine is not obviously busy, but my OCL program gets slowed a lot. I haven't tried other OCL workloads yet.
5) I did some experiments with other programs, mainly DirectX things. I used the AMD Leo sample, and also one called HK-2207. Unsurprisingly, running my opencl concurrently with those others slows it back down. I noticed even when I pause the DX sample, my program is still maybe 25% slower.
1) What can I do programmatically to make sure OCL app can run full-speed? Boost my thread priority is all I can think of at the moment.
2) If I boost thread priority would I be able to restore default p-states?
2) why would a paused DX program make my opencl workload slowdown?