Profiler shows that ~14% of process CPU time lost in kernel calls (ntkrnlpa.exe ) and ~13% inside aticaldd.dll. For comparison, in CPU version of app same function took only 18% of process time.
With current state it's no meaning to use GPU cause it even increse spent CPU time, not decrease it.
I tried to reduce number of kernel calls by increasing domain size. But with further domain size increase kernels begin to fail with some errors (look my other thread for such error example). On this stage I don't care much about GPU code effectiveness, all I need is to decrease CPU time (only if CPU time decreased there is some reason to use coprocessor at all).
~70% of time spent in aticaldd.dll fall on calddiGetExport function call. What this function does, what part of code could lead to increase in call ratio of this function?