How does the DirectX pipeline work in current drivers ?
I have an AMD Radeon HD6870 and lately I've played around with DirectX 11 and optimizing render calls, state setting, etc, and I've observed a few anomalies.
First off, having 2000 draw calls would yield me 100 FPS for my scene. GPU Perf Studio said I'm draw call bound which was adequate I guess, but why was the GPU reportedly being only ~60% busy ? Does being draw call bound basically mean that the driver does more setup ? I thought the card and it's caches undergo greater timings if you have draw calls under a certain amount of data.
I've merged same material objects into ~200 draw calls to get a 30% improved framerate, and now the GPU repors 97-99% usage. Now, an anomaly I've noticed, I have a big 4 piece terrain that always get rendered from my test viewpoint. If I don't merge the pieces it renders faster then if I merge it into a big draw call. It's not a big difference, just 2-3 FPS, but still it doesn't make all that much sense unless the draw calls are processed in some misterious way.
One of my biggest question marks about them is, does the GPU process draw calls completely serially ? I could see how the pixel stages need to be serial, but the vertex processing stage could happen in parallel for 2 drawcalls. I remember a slide from AMD showing how in HD6950/HD6970 it can rasterize 2 polygons in parallel, but what about the rest of the pipeline ? Do pixels get processed after all vertices are processed even if there's free shader units at it's disposal ?
Basically... is there any kind of documentation about how drivers make the card work ?