I am running a Barnes-Hut nbody simulation and if I increase the body count above a certain threshold while running the kernel on the GPU my computer becomes unresponsive. Is there any remedy for this? As the body count increases the calculation time for each kernel execution becomes longer, but it should be a logarithmic not linear increase. Even given this, it seems as thought the GPU devotes all its resources to executing my entire batch of commands before allowing any other work.
Some one on the khronos forum explained to me that this is a known issue and is being looked in to. I tried a couple of approaches to avoid this problem:
I broke my NDenqueue into a few calls manually : this didn't help, in fact under some cases contention was worse.
I tried to use clCreateSubDevice to select a subset of compute units so as to leave some cores free for other processing : this isn't possible, as the GPU doesn't support creating sub devices (even though the query about how many sub devices is possible returns the same number as how many compute units is possible does), which is understandable.
The OS/AMD driver schedules GPU tasks with command buffer granularity. If two NDrange calls are nested on the same command buffer the GPU scheduler will not be able to insert draw commands between them. If you like to improve the responsiveness of the desktop, break your NRange call to several calls and insert clFlush between them. (you might pay performance penalty for this.)
The truly good solution is to use two GPUs, one for display and one for compute.