After having only very minimal success with GPU PerfStudio, I decided to attempt integrating GPUPerfAPI into the engine that we are using. However, I am having two issues, the most important of which is the driver getting stuck in an infinite loop.
I am enabling only the GPUTime counter so that the required number of passes is only 1, thus hopefully making it useful for real-time performance analysis in a game by not having to render the same, exact frame more than once. I am doing one session, one pass, and one sample per frame. The sample encompasses all GPU commands for the frame.
The first (and less troubling) issue is that the first four sessions never become ready (GPA_IsSessionReady). If I try to read the counter value anyway (blocking), then I get a freeze in the driver. I can work around this by starting with the 5th session when I go to read the results.
However, after issuing 32 sessions, calling GPA_BeginSession subsequently freezes in an infinite loop in the driver (it freeezes trying to begin the 33rd session). Stepping through the assembly, it is stuck in the following loop in GPUPerfAPI-DX11-x64.dll:
000007FED42A2160 mov rax,qword ptr [rbx]
000007FED42A2163 mov rcx,rbx
000007FED42A2166 call qword ptr [rax+18h]
000007FED42A2169 test al,al
000007FED42A216B je 000007FED42A2160
When pausing the program in the debugging, the call stack usually goes from the GPUPerfAPI dll, into the Microsoft DX11 dll, and into an ATI driver dll. Usually something like this:
|[Frames below may be incorrect and/or missing, no symbols loaded for atidxx64.dll]|
I have experienced this issue with both the 12.4 driver and 898 Release Candidate 5 driver. I am running an MSI R7950 card.
Is there something that I am doing wrong?
I forgot to mention that for the second issue (freezing after 32 sessions), no API calls are returning errors, and the callback that I registered using GPA_LOGGING_ERROR_AND_MESSAGE did not spit anything out.
I would imagine that the results from the first frame will not be available until several frames later - with a good engine the CPU should be at least 2 frames ahead of the GPU in terms of its processing, otherwise you are not giving the GPU enough work to do. Try querying to see if any pending sessions have results available and if so, report those values, so you'll have running list of pending sessions, it sounds like there may be a 4-5 frame delay before the results will become available. Alternatively, you may be able to put in a flush before asking for the session to be ready (although I believe that's what the blocking function does), that would normally be a bad thing though because then you're synchronizing the CPU and GPU, slowing down your application.
When the results do come back, make sure you are ending the session, otherwise we will think that you want to keep it around. Currently there is a limit of 32 active sessions, so I'm not surprised that is where you are seeing an issue. However, the 33rd session should cause the 1st one to be overwritten. We'll have to add a test for this and address the issue.
Let me know if you continue to have trouble after trying to delay the query for results and after ensuring that you call GPA_EndSession(..).
The problem that I am encountering is that the results for the first 4 frames (sessions) are never available, even thought the results for frames/sessions 5+ are available.
For each frame, I do the following:
At this point, we are simply using Direct3D11 Event Queries for GPU timing, and it seems to be working pretty well. I was hoping to use the ATI counters in order to get more specific information (utilization of the different units of the GPU in order to determine bottlenecks). As it stands, I can only get timing information from the ATI performance counters (because getting more information requires rendering the same frame several times...up to 12), so they are not providing any benefit over the D3D queries.