I have observed that many kernels do not take enough time to execute in order to let the GPU reach its maximum clock frequency. After the kernel's execution the frequency almost instantly seems to drop back to its low frequency level. This is very evident on programs which launch many short sequential kernels as in quite many cases the clock never seems to manage reaching a high frequency and it almost instantly drops after every kernel's invocation. This causes many OpenCL programs to exhibit low performance, e.g. about a half or a third of the expected.
Doing this: echo high >/sys/class/drm/card0/device/power_dpm_force_performance_level
seems to be a workaround but I don't find normal to have to set it permanently or doing this every time I have to execute an OpenCL program.
Perhaps, it the "auto" performance level let the GPU retain a high clock frequency for a short period of time after a kernel's execution could eliminate this problem.