AMD has published a blog post discussing how temperatures and thermals are calculated on its Navi GPUs. There has been some concern in the enthusiast community about the temperatures posted by reference cards, given that these GPUs can report thermal junction temps of up to 110 degrees Celsius. This is substantially hotter than the old temperature of 95 C, which used to be treated as a thermal trip point.
Beginning with Radeon VII, AMD made significant changes to how it measures temperature across the GPU die. In the past, AMD writes, “the GPU core temperature was read by a single sensor that was placed in the vicinity of the legacy thermal diode.” That single reading was used to make decisions governing the GPUs voltage and operating frequency. Radeon VII and now Navi do things differently. Instead of deploying a single sensor, they use a network of sensor data gathered from across the GPU. AMD has deployed the same AVFS (Adaptive Voltage and Frequency Scaling) strategy that it uses for Ryzen to maximize performance of its GPUs.
AVFS deploys a network of on-die sensors across the entire chip rather than relying on a single point of measurement. Rather than calibrating voltages and frequencies at the factory and preprogramming a series of defined voltage and frequency steps that all CPUs must achieve, AVFS dynamically measures and delivers the voltage required for each individual CPU to hit its desired clock frequencies. This allows for finer-grained power management across the CPU, improving both performance and power efficiency across a range of targets.
The 110-degree junction temperature is not evidence of a problem or a sudden issue with AMD graphics cards AMD now measures its GPU temperature in new locations and reports additional data points that capture this information because it adopted more sophisticated measuring methods. Arguing that the company should be penalized for reporting data more accurately is akin to arguing that manufacturers ought to hide data because they’re afraid some customers won’t understand it or put it in the proper context.
AMD provides a pair of graphs to illustrate the difference between its Vega 64 and earlier measurement system and how it calibrates voltage on the 5700 XT today. The old discrete state method is shown below:
Now, compare that against the frequency/voltage curve for the 5700 XT.
The 5700 XT is designed to continue boosting performance until it hits its thermal junction threshold. From the company’s blog post:
Paired with this array of sensors is the ability to identify the ‘hotspot’ across the GPU die. Instead of setting a conservative, ‘worst case’ throttling temperature for the entire die, the Radeon RX 5700 series GPUs will continue to opportunistically and aggressively ramp clocks until any one of the many available sensors hits the ‘hotspot’ or ‘Junction’ temperature of 110 degrees Celsius. Operating at up to 110C Junction Temperature during typical gaming usage is expected and within spec. This enables the Radeon RX 5700 series GPUs to offer much higher performance and clocks out of the box, while maintaining acoustic and reliability targets.
There’s a certain knee-jerk “I don’t want 110-degree anything in my case!” reaction from enthusiasts that’s both perfectly understandable and somewhat misguided. There’s an unconscious underlying assumption that 110 degrees Celsius represents a dangerous temperature (it doesn’t) or an extremely loud cooler. The 5700 XT and 5700 are much quieter than Vega 64, but if that’s still too loud, third-party cards are starting to hit the market. Companies like Asus were able to build coolers that handled the R9 290X beautifully, so the 5700 XT should be tamable as well.
Higher temperatures are partially an artifact of better measurement. They’re also a reality of advanced silicon manufacturing nodes. Our ability to pack transistors closer together has outstripped our ability to reduce their power consumption by cutting operating voltages. As a result, increasing transistor density increases hot spot formation and higher peak temperatures. AVFS helps mitigate this tendency by ensuring that operating voltage is precisely mapped to frequency, but it can’t fix the fact that AMD has packed more transistors into a smaller space, leading to higher thermal density.
Higher temperatures are not an intrinsic reason to be concerned about a product provided the manufacturer certifies that this is expected behavior. When I got into computing, a CPU temperature of 50 C (measured via in-socket thermistor) was considered extremely high. Today, Intel and AMD build silicon that can operate reliably at 95C or above for years at a time.
Why 110-Degree Temps Are Normal for AMD's Radeon 5700, 5700 XT - ExtremeTech
That is hotter than my hot water dispenser which is 185 F to make a cup of instant coffee
I get what AMD is saying. Now many reviews and tube sites are showing these cards are beginning to throttle and even artifact at temperatures in the 90's.
I can say that it does not shock me. Vega owners complained that those cards had issues at lower than the published limits.
I as a RX 580 owner can tall you that my processor will start getting flaky at about 76 degrees which is way lower than the mid 90's limit AMD gives it.
So while AMD assigns a limit that doesn't mean that many if not most based on a lot of people out there already complaining of issues and based on prior couple generations exhibiting the same type problems, these limits may not really mean much. Your mileage will vary and you may throttle at lower temps. To me that means that AMD was and is having quality issues maintaining what are supposed to be their thermal limits. Bottom line is AMD again seems to be struggling to have their cards run at default settings.
I for one wouldn't want that toaster on my desk either. It was part of why I stopped using my RX 580. It ran so hot I had to leave the case open with a fan blowing in it all the time and that card ran cooler than Vega and now Navi.
Good to know the difference regardless in the method AMD is using to get the temperature measurement.
I use a lot of fans in my chassis to move air with a vengeance
I have a large case with great ventilation. I have been using many case fans since long before it was a popular gaming thing to do.
My gaming rig has has 2 top 120 mm fans. 3 120's on the front. 2 120's on the side and 2 more 80 mm on the back. And that case still got too hot when closed up with the RX 580. I had to open the side and place a box fan blowing on it. My current card never gets above 72 under full load and that is with the case all closed up again, the way it should be.
I use front intake fans and rear/top exhaust fans. No side fans.
This provides a solid flow of air through the chassis to remove heat rapidly.
Side case fans don't change that dynamic. You just have to keep positive air flow. If you have a glass sided case you likely don't have side fans. Most metal sided cases allow for side fans. I like more larger fans as you get plenty of air but they run silent.
my chassis has a plexiglass side panel
most corsair chassis are like mine with front to rear airflow design
I have had my case for years. Until I need a new one I will not go glass. I probably won't get a case like mine again without going glass on a new one. They just don't seem to be made like mine anymore. I'm kinda cheap like that. I will reuse as long as I can. I don't care if my case looks fancy. I think they look cool all decked out but I wouldn't spend a dime to make mine look pretty. Just want it to run my games and apps well is all I care about. But you are right plenty of cooling with positive air flow is key to a happy computer.
The Corsair carbide series come in a range of options but I have the more basic ATX version which has 4 drive bays as some only have 2
the reason for more drive bays is the growing number of games I have
The drive bays is one of the reasons I keep mine. It has 4 external and 6 internal 5 1/4. I too have a lot of games so I get the need for a lot of drives! I did move to a couple 5 tb drives and took out 4 older 1 tb drives the last time I upgraded. I am now running out of space again. But I have room still for more drives..... I too have noticed many cases coming with less bays. I guess most people are now using an M.2 or two and maybe a 3.5 ssd an only one large HD. So I get for many you just don't need all the bays anymore.
I have yanked older 2 TB disks for 4TB and now I am looking to deploying 8 TB hard disks