Thanks for sharing the timing information. Yes, I can see the large difference in execution time.
As for a reproducible test-case of the OpenCL code, I can't due to it being intellectual property of my customers.
I can understand. However, could you please share the host-side code flow (i.e. sequence of APIs and other works, kernels launching etc.) so that we can reproduce something similar at our end?
[For privacy, if you want, I can share my official email address where you can send the code or any other details.]
Interestingly, another user has also reported similar inconsistency on this thread Kernel Timing Anomalies
[These two issues may not be related, however, just check that thread once.]
BTW, I've a question. If a dummy kernel is executed before starting the actual processing, do you observe any improvement for the first run? Could you please check and share your observation?
First, regarding the Overdrive_Sample, after I attached the AMD GPU as display, it is recognized as adapterActive but still states: Can't get Overdrive capabilities
Sorry, I'm not aware of this. You may expect a separate reply from velan or someone else from ADL team.
Hi Tomer !
Bear in mind that keeping steady clock has its dis-advantages:
1.) When the GPU stays constantly at high clocks it also consumes its maximum TDP. Will it be acceptable for the medical device if the Pitcairn GPU alone consumes 130W all the time instead of 10W on idle? What about noise?
2.) Even with fixed high clocks, the first launch will always be slower because none of the data is cached yet . (at least the ISA cache and the constant cache.)
Yes, I'm aware of that.
I found one cause of the performance difference, will start a new thread for it and should open a bug as well.
Usually for medical devices there is the idle time where nothing is being done, that's the time to set the power settings to default.
Then, there is the processing time, a specific duration in which a patient is being scanned, that's the time where performance matters and everything is expected to be steady.
Anyway, as I said, the sample for controlling the power policy doesn't work.
Dipak, I can't send a reproducible test case because this is a very large software which also wraps all of the OpenCL code, so it would have consumed a lot of time to strip everything to something which I could send.