Found this at Nvidia Forum explaining everything about TDR errors. This is just an Informational Thread:
What is TDR? Windows Vista has a new feature called Timeout Detection and Recovery (TDR). TDR attempts to detect problematic situations and recover to a functional desktop dynamically. In prior operating systems these situations would have resulted in a system freeze and forced customers to reboot their PC. More information about this Vista feature can be found here: http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx
. Therefore, when you see the TDR error message "Display driver stopped responding and has recovered", you know that the display driver is being reset without requiring a reboot.
Why does it happen? TDRs can occur for a variety of reasons, many of which are unrelated to the graphics card or graphics driver. Since Vista launched, NVIDIA has been working hard to address TDRs issues that are specific to the graphics driver. Last year, we set up the NVIDIA Vista Quality Assurance Site to record and address user issues: http://www.nvidia.com/object/vistaqualityassurance.html
. Since that time, we have resolved a majority of the issues specific to the graphics driver and have also worked directly with Microsoft to release five Vista hotfixes, all of which are now part of Vista Service Pack 1 (SP1). If customers are still experiencing issues, we recommend that they continue to file issues at the quality assurance site. We also recommend that customers look carefully into the wide range of other causes of TDR issues such as overclocked components, incorrect component compatibility and settings (especially memory configuration and timings), defective parts (memory modules, motherboards, etc.), insufficient system cooling, and insufficient system power. Throughout the forums, there are numerous reports of hardware solutions to TDRs.
Just short of a year ago I built a system that will crash after about an hour of intensive gaming on only two games... those were the only two games I played that were intensive enough to cause it. The reason was that my power supply was inadequate. Upgrading the power supply eliminated all the problems.http://forums.anandtech.com/messageview.as...hreadid=2116981
An nZone Forums user, jimbonbon, posted the following excellent information about TDRs. Please note that unless specifically stated by a Moderator or Administrator, information on the nZone forums is just the advice of other forum users, it does not constituted official support and is not a substitute for the assistance of a qualified technician. Please keep in mind that users of the NVIDIA User-to-User forums are not representatives of NVIDIA Corporation.
[quote name='jimbonbon' post='560166' date='Jul 1 2009, 03:57 AM']
Updated 24th February 2012:
- Additional information on potential driver bug triggering TDRs with 28x.xx and 290.xx drivers
- More resolution example posts
- Updated Microsoft definitions and links
This thread has grown quite lengthy, but hopefully is still useful for people. If you are having 'driver not responding' errors then please read this post so that you understand exactly what the issue is and how varied the causes can be - plus you may find a resolution within.
Just as a 'disclaimer' here, this thread is not intended for everyone to post their problems on... The purpose of this thread is to try and help you, but also to prevent multiple topics on the same subject. Lots of people have seen these errors, so hopefully this thread should help you understand exactly what you are seeing before you post. I have seen and responded to a lot of TDR related topics now where people have not made the effort to do any prior searching.
The generic error people visit this forum for is: 'Display driver nvlddmkm stopped responding and was recovered.'
This is also seen as:
'Display driver atikmdag stopped responding and was recovered.' (AMD/ATI cards)
'Display driver xxxxxxxx stopped responding and was recovered.' (others)
Also noted as nvlddmkm.sys, atikmdag.sys, and xxxxxxxx.sys bug-check/BSOD.
As a starting note, this is not an nVidia issue. It is not an ATI issue either. These errors are triggered by a Windows service called 'Timeout Detection and Recovery' (TDR). You will only see this error on Windows Vista and Windows 7, as TDR is a feature of the new WDDM driver model (implemented first in Windows Vista). TDR is supposed to be there to help stop BSOD's by resetting the GPU and/or driver when there is an issue or long delay. If the problem happens multiple times in a row, a BSOD can occur.
If you are having this problem frequently then you will probably find it very frustrating, however be reassured that the problem is normally perfectly solvable, although can take some troubleshooting to resolve. I personally have seen this issue on two separate nVidia builds, and an Intel onboard GPU.
How does TDR work?
Timeout Detection and Recovery
Windows Vista and later operating systems attempt to detect situations in which computers appear to be completely "frozen". They then attempt to dynamically recover from the frozen situations so that their desktops are responsive again. This process of detection and recovery is known as timeout detection and recovery (TDR). In the TDR process, the operating system's GPU scheduler calls the display miniport driver's DxgkDdiResetFromTimeout function to reinitialize the driver and reset the GPU. Therefore, end users are not required to reboot the operating system, which greatly enhances their experience. The only visible artifact from the hang detection to the recovery is a screen flicker. This screen flicker results when the operating system resets some portions of the graphics stack, which causes a screen redraw. Some legacy DirectX applications (for example, those DirectX applications that conform to DirectX versions earlier than 9.0) might render to a black screen at the end of this recovery. The end user would have to restart these applications.
The following sequence briefly describes the TDR process:
The GPU scheduler, which is part of the DirectX graphics kernel subsystem (Dxgkrnl.sys), detects that the GPU is taking more than the permitted amount of time to execute a particular task. The GPU scheduler then tries to preempt this particular task. The preempt operation has a "wait" timeout, which is the actual TDR timeout. This step is thus the timeout detection phase of the process. The default timeout period in Windows Vista and later operating systems is 2 seconds. If the GPU cannot complete or preempt the current task within the TDR timeout period, the operating system diagnoses that the GPU is frozen.
To prevent timeout detection from occurring, hardware vendors should ensure that graphics operations (that is, DMA buffer completion) take no more than 2 seconds in end-user scenarios such as productivity and game play.
2. Preparation for recovery:
The operating system's GPU scheduler calls the display miniport driver's DxgkDdiResetFromTimeout function to inform the driver that the operating system detected a timeout. The driver must then reinitialize itself and reset the GPU. In addition, the driver must stop accessing memory and should not access hardware. The operating system and the driver collect hardware and other state information that could be useful for post-mortem diagnosis.
The operating system resets the appropriate state of the graphics stack. The video memory manager, which is also part of Dxgkrnl.sys, purges all allocations from video memory. The display miniport driver resets the GPU hardware state. The graphics stack takes the final actions and restores the desktop to the responsive state. As previously mentioned, some legacy DirectX applications might render just black at the end of this recovery, which requires the end user to restart these applications. Well-written DirectX 9Ex and DirectX 10 and later applications that handle Device Remove technology continue to work correctly. An application must release and then recreate its Direct3D device and all of the device's objects. For more information about how DirectX applications recover, see the Windows SDK.
Limiting Repetitive GPU Hangs and Recoveries
Beginning with Windows Vista with Service Pack 1 (SP1) and Windows Server 2008, the user experience has been improved in situations where the GPU hangs frequently and rapidly. Repetitive GPU hangs indicate that the graphics hardware has not recovered successfully. In these situations, the end user must shut down and restart the operating system to fully reset the graphics hardware. If the operating system detects that six or more GPU hangs and subsequent recoveries occur within 1 minute, the operating system bug-checks the computer on the next GPU hang.
TDR Error Messaging
Throughout the TDR process (that is, the process of detecting and recovering from situations where a GPU stops operating), the desktop is unresponsive and thus unavailable to the end user. In the final stages of recovery, a brief screen flash occurs that is similar to the brief screen flash that occurs when the end user changes the screen resolution. After the operating system has successfully recovered the desktop, the following informational message appears to the end user.
The operating system also logs the preceding message in the Event Viewer application and collects diagnosis information in the form of a debug report. If the end user opted in to provide feedback, the operating system returns this debug report to Microsoft through the Online Crash Analysis (OCA) mechanism.
It is possible to disable the TDR service or make changes
to the registry in order to increase the timeout period or turn off the service, however please note that this is not recommended nor supported, and in fact doing so is considered a Windows Logo Program violation.
For the original Microsoft links to the above quoted information, please see here
Now you know what exactly the error is, you probably want to stop it happening. I would like to tell you that there is a one-stop fix that I could recommend, but unfortunately, TDR events can be caused by many different problems. First off though, if your computer has been bough 'off-the-shelf' and is brand new, then you should think about talking to where you bought it from.
Common issues that can cause a TDR:
- Incorrect memory timings or voltages
- Insufficient/problematic PSU
- Corrupt driver install
- Unstable overclocks (GPU or CPU)
- Incorrect MB voltages (generally NB/SB)
- Faulty graphics card
- A badly written driver or piece of software, but this is an unlikely cause in most cases
- Driver conflicts
- Another possibility that people tend not to like to hear, is that you are simply asking too much of your graphics card. What I mean by this, is that if you have your settings too high and the graphics card struggles and falls to very low FPS, then something graphically complex occurs, the GPU may not be able to respond and a TDR error may occur
- Some users have experienced TDR errors whilst browsing the web with the 280.xx, 285.xx and 290.xx drivers. Please head to this link to clarify if this is relevant to you - this is quite a specific issue which seems to predominantly affect web browsing as opposed to gaming. There are no categoric fixes but some users have found that changing the power management mode to 'Prefer Maximum Performance' has helped.
Examples of specific TDR causes:
Things to check or consider initially in your troubleshooting:
- Check for newer driver version or cleanly uninstall/re-install your drivers. Great description of how to do this here (full credit to DJNOOB for this).
- If you have multiple 'GPU tools' like EVGA Precision and MSI Afterburner installed, consider that it is only advisable to have one tool such as this at any one time.
- If the issue is only with a specific game, check for patches.
- If this is a new problem for you, have you just added any new hardware or updated/installed any new drivers? Consider rolling them back.
- Check temperatures. Its important you check these at load, which is generally when a TDR event will occur. Everest Ultimate Edition is a good tool for this, or OCCT's GPU stress test. If things are too hot, you can use tools such as EVGA Precision to increase GPU fan speeds on graphics cards. Cleaning your system of dust can help temperatures significantly. Common sense will normally tell you if something is too hot, but if you aren't sure, the information is generally available online.
- Check that your RAM is running at the correct settings as defined by the manufacturer.
- Remove any overclocks on your system and test with stock clocks. This includes memory, CPU and GPU (even factory OC'd cards). Best to try each separately so you can be sure if one solves the issue.
- Attempt a CMOS reset to return all BIOS settings to default. This is a good hardware troubleshooting step as it also resets the IRQ assignments - you can normally reset the CMOS either through a jumper on the motherboard (see manual), or by disconnecting the mains power and taking out the motherboard battery for 5 minutes. You will likely need to go in to the BIOS after this reset to check the memory timings/voltages are correct, as these will not always do so automatically.
- Run memtest (memtest.org). This should complete with NO errors.
- If you have just installed a new graphics card, check your PSU ratings. Is it providing enough power, and most importantly enough Amps on the 12V rail.
- If you are using SLI, try each card separately to see if the fault lies with one.
- Try graphics card/cards in another computer if you can.
As most people who end up reading this will have slightly custom computers in one way or the other, please try to remember that checking things like RAM timings & PSU voltage go hand in hand with modifying or building a computer. A lot of people assume that any hardware they buy and plug in should just work, and any software they then install should be fine also... this is not entirely true. No hardware or software vendor can truly recreate all of the different possible combinations, so do expect some tinkering to be required every once in a while.
For those with laptops, I appreciate there are a lot of steps here you cannot complete. However, the confined space of a laptop plus dust and age can mean that overheating is a real possibility. Beyond this you need to look at reinstalling drivers and software, and then you should be looking at potential hardware issues and likely an RMA (assuming of course you have not been overclocking in software or making changes in the BIOS).
Programs to use for stress testing CPU:
- Prime95 (would advise running for at least a few hours).
- Intel Burntest (run at least a few passes)
- OCCT (good linpack test for CPU)
Programs to use for stress testing GPU:
- 3DMark Vantage
- 3DMark 11 (DX11 GPUs only)
- Any of the Crysis series
Programs to use for monitoring temperatures:
- EVGA Precision (GPU only)
- MSI Afterburner (GPU only)
- Everest Ultimate Edition (now known as AIDA 64)
- CoreTemp (CPU only)
- RealTemp (CPU only)
- OCCT (stress testing and temp monitoring)
I can highly recommend Everest/AIDA64 as this shows you ALL your temperatures, including other GPU components. It is however not free - you can download a trial but it has some functions limited (including some temperatures).