Scripts that I used, System info , samples , versions of OS and WPT e.t.c
The beginning
Let's start from the beginning. I use the Windows Performance Toolkit for performance analysis that I conduct systematically. Additionally, I use it for comparisons (before and after changes that I test). Recently, I decided to start taking measurements for this exact reason—to determine if a particular setting provides a detectable performance boost in the system.
During the measurements, I noticed an anomaly in a metric (DPC/ISR Enter Time min). When I navigated to Graph Explorer > Computation > DPC/ISR > DPC/ISR Duration by Module, Function, I observed negative values in dxgkrnl.sys, such as -1,449,211,376.700 (µs). The other metrics, although extremely high, did not have negative values. Additionally, there were no issues with other drivers besides dxgkrnl.sys and occasionally Wdf01000.sys.
Since we are talking about core Windows drivers, it is expected to see high values, but always within reasonable limits.
Measurements procedure
The measurements were conducted using xperf and through the script available in the post. The main command that invoked the DPC/ISR providers was "xperf -on base+interrupt+dpc". By default, the recording was done using Memory Logging mode and not File Logging mode. I emphasize this because, as I will mention later, Jeff Stokes (ETL,WPA Expert) noted the following:
"So WPR records in memory mode, it's common to get traces where it's missing a lot of the CPU graph due to the buffer being set too conservatively. It doesn't report as dropped events, but you'll see all your CPU time is hyper-compressed in one small area on the right side of the graph. WPR -filemode fixes this."
- Based on this, it could be possible that an issue arises due to recording in memory mode. On the other hand, Pavel Yosifovich mentioned the following:
"Maybe the sum overflows (32-bit) and that's why it becomes negative."
- Based on this, we might be dealing with a bug in WPA.
The way I conducted the tests was as follows: I used an xperf script with the command mentioned above and made sequential recordings, typically around 50, with each lasting 10 seconds. I conducted measurements both in idle mode and during games to capture the system's behavior under heavy workload. In both cases, I observed these contaminated results with negative values and a high number of DPCs/ISRs, with the main culprit always being dxgkrnl.sys.
I should note that with xperf, I exported the data in two ways: I generated a TXT report and the classic ETL file. Almost always, the following was true: if I opened individual samples in WPA, there were no strange or negative numbers. When I opened them together, the anomalies appeared. Secondly, the TXT reports exhibited the problem, whereas the corresponding ETL file, when opening the counters, did not show anything unusual at first glance. This led to a deadlock.
Below, I present a sample TXT report to illustrate what I mean. Additionally, the frequency at which I observed the problem was such that, out of 50 samples, at least 5 would be problematic based on the TXT file or if I opened all ETLs together as a single trace in WPA.
Here are the initial steps I took to identify the cause:
- Reduced the number of recordings to 40, then to 30, and so on.
- Decreased the duration of each recording or increased it in some cases.
- Tried different versions of the Windows Performance Toolkit (WPT).
- Conducted recordings with Windows Performance Recorder (WPR).
- Modified the xperf commands to see if it would resolve the issue.
- Opened the recordings with WPA Preview.
- Reinstalled Windows 10 with all updates and the latest drivers.
- Tried Windows 11.
- Ran the DirectX Diagnostic Tool without detecting any issues and ensured all DirectX updates were installed.
- Restored BIOS settings to default.
All these tests were conducted on the same machine, and essentially, I tried everything on this system.
Then, as anyone would, I tried the tests on another machine, specifically a laptop with an Intel processor and integrated graphics. I conducted simple recordings within the same range, and this time I didn't observe anything unusual. However, I didn't have the luxury of performing measurements under heavy system load as I did previously. Therefore, I concluded that the issue might be with my machine.
The sum of 50 samples opened with WPA from Windows ADK
The sum of 50 samples with WPA from Windows ADK
Jeff Stokes
While reading the book "Windows Performance Analysis Field Guide 1st Edition.pdf," I came across Jeff Stokes. Therefore, I took the liberty to explain the problem to him and get his opinion. Indeed, he found it peculiar and mentioned a bug in AMD drivers related to DPC Watchdogs. Specifically, he noted:
"It's more likely a DPC GUID is reused from one trace to another, and it's piecing together the start and end times that are inverted. I know AMD GPUs still seem to have DPC watchdog events (last tried a 7900xtx and it was unreliable to me). This could be a legitimate reading from the AMD GPU, the real source of the TDR/DPC watchdog events AMD seems plagued with. Maybe it sends something with a negative timing, like the old AMD Opterons used to do."
LAST THOUGHTS
I should mention that otherwise, the system is functional, without issues like FPS drops, stutters, high latency, high ping, or screen tearing when playing games or during regular computer use. The recordings I make with FrameView and CapFrameX have never shown anything unusual; everything was normal.
So, what is really happening here? Am I wasting my time chasing ghosts, or have I truly encountered something interesting? I hope someone can shed light on this matter so we can proceed further.
I plan to conduct additional measurements under different conditions and on a system with an NVIDIA GPU to see if the results are consistent or different under similar conditions. Thank you very much to anyone who read all of this, and I apologize if I'm not fully clear. For any clarifications, please feel free to contact me!