Recently I changed algorithm in app to keep more data directly on GPU that caused considerable decrease in number of buffer mapping (and each buffer mapping was sync point also).
I expected improve in run time due to increased GPU load and decrease in CPU time also due to less work for CPU to do.
But almost all what I got is sharp increase in CPU time. If before change CPU time constituted only small part of elapsed time, now CPU time almost equal elapsed. That is, almost 100% of CPU usage during whole app run.
I tried to avoid such CPU usage putting working thread into sleep before sync points - no success. While logs show that app reads event and sleep until it get corresponding status, CPU time not decreased.
So, I used ProcessExplorer to find who is consuming CPU. It was AMD driver thread with next stack:
picture clearly shows that this thread is main CPU consumer for the app's process:
So the question is - how to avoid such behavior? It seems that w/o big number of sync points between GPU and host code AMD driver goes mad and starts to use whole CPU core for own needs.
EDIT: Here are illustrations with CodeXL TimeLine pics how timeline looked before
and with marker event and Sleep(1) loop until event reached
As one can see (profiling done on C-60 APU but discrete HD6950 under different drivers shows same CPU usage pattern) there is quite big time interval of ~80s where GPU works on its own w/o synching with host. And it's place where AMD driver starts to consume whole CPU core.
EDIT2: it's very similar to issue described here: Re: Cat13.4: How to avoid the high CPU load for GPU kernels? by .Bdot
Any new cures since 2013 year?..