Today, AMD is launching their Ryzen Threadripper 3990X — a 64-core CPU with a base clock of 2.9GHz, a boost clock of 4.3GHz, and a 256MB L3 cache. Originally, our plan was to present a deep-dive review on the CPU, briefly discuss the overclocking project, and follow up with an in-depth article on the overclocking component on Monday.
Unfortunately, a family emergency has yanked me away from the keyboard and that plan is going to need some modification. While I’ve got most of the data I needed for the review, I need some time to pull it together. Instead of burying you in charts and graphs, I’m going to talk a little about this CPU, what I’ve seen, and what I think it means.
First, here’s a little something to whet your appetite:
Stock score on the 3990X: 25,394. The 3970X scores 17,286.
At all-core 4GHz, the 3990X is 1.21x faster than stock and 1.81x faster than the 3970X.
As of February 6th, 2020, that’s the highest single-socket Cinebench R20 score in the world, according to HWBot. It’s the fourth-highest Cinebench R20 score overall. I achieved it using an Asus Zenith II Extreme motherboard and AMD’s Ryzen Threadripper 3990X at an all-core locked clock of 4GHz on 64 cores. That’s 256GHz altogether, or 0.256THz. While it obviously doesn’t scale the way a 0.256THz single-core CPU would, that’s how fast the CPU is running, in aggregate.
If all goes well, I’ll break that record and formally log the results over the weekend, then talk about the OC project on Monday. I will say this, though — the power and current challenges of operating at a high all-core clock across this many CPUs are formidable. I do not know what the highest clock I can achieve in stable operation is. I hope to answer that question this weekend.
So. With that teaser out of the way, let’s talk about the 3990X and its performance and positioning at stock configurations.
What the 3990X Brings to the Table
The first and most important thing to understand about the 3990X is that this is not a CPU for everyone. The vast majority of applications are not designed to scale this high. Windows itself is not designed to scale this high. Microsoft’s support for more than 64 threads in Windows is a bit of a kludge.
Since Windows Server 2008, Microsoft deals with systems with more than 64 threads in a specific fashion: by creating processor groups. Each group contains up to 64 logical processors (a Hyper-Threading core and a physical core are treated identically), though Microsoft does employ spatial locality awareness to keep a logical core and a physical core part of the same processor group when possible. What this means, however, is that by default, applications can only use 50 percent of the 3990X’s 128 threads. (You can read more on this topic at Bitsum here.) There are ways to get around it — applications can implement their own schedulers that take better advantage of a large-core CPU.
What this means, in aggregate, is that Linux often offers better scaling for the 3990X than Windows does. Rob Williams at Techgage has done a great deal of Linux testing and I’d recommend his article if you want a specific comparison of scaling in this area.
Under Windows, the 3990X shows significant performance uplifts over the 3970X in several areas. Rendering is easily the CPU’s biggest winning category; a number of rendering engines show uplift over the 3990X ranging from 1.3x to 1.6x depending on the application. One of the steps I took for this review was to buy access to the Blender Cloud in order to test some of the professional-quality scenes provided in that system. The more than 30 render tests I ran in Blender alone confirmed that users of this application can look forward to strong scaling, though the exact amount depends on the type of scene. We’ll also examine how the 3990X and the 3970X compare when running multiple workloads simultaneously.
Because Microsoft’s threading engine can’t support more than 64 threads by default, there are a handful of instances where turning SMT off on the 3990X improves performance. We’ll examine those as well and discuss whether there’s a case for the CPU as a 64C/64T chip compared to the 3970X. We’ll include performance figures for Cascade Lake and the 10980XE, not because Intel is directly competing against the 3990X with that chip, but because it’s important to put the best representative figures in that we can, and Intel is currently making its own argument at the $1,000 price point. There are a few tests where the 10980XE pulls ahead, core counts notwithstanding. With a chip this expensive, I wanted to explore the nooks and corners of the performance world.
One reason that this review will take a bit longer to pull together is that I’m also working with different benchmarks than we’ve used before. Applications like Agisoft Metashape, Pix4D, Da Vinci Resolve, and Maya 2020 (with a CPU-stretching benchmark created by Antonio Bosi), plus a heck of a lot of Blending. We’ve got applications where the 3990X proves its own value (if you play in this kind of professional market, at least), and yes, tests that demonstrate you’d really be better off with a 3970X.
We’ll also have more overclocked benchmark results and, if things go my way, a few higher scores to crow about. It’ll be worth the wait.
I’m going to keep some of my thoughts back for the actual review, but I’ll say this here: The 3990X is a very exciting CPU, even if it isn’t a chip that it makes sense for most people to buy.
Testing this chip reminded me that there was a time when we waited on operating systems and applications to be able to take advantage of CPU features.
The first iteration of Hyper-Threading only worked properly if you ran either XP SP1 (SP1 itself was still fairly new) or had installed SP4 for Windows 2000. We waited on applications to add SSE2 support for the Pentium 4. We waited for a 64-bit Windows and native applications, just as we waited for 32-bit apps and OS support in an earlier era. Now, thanks to the 3990X, we’re waiting for Microsoft to improve how it handles high core-count CPUs.
The Difference in Approach Between AMD and Intel
AMD, to be clear, isn’t the first company to run into this problem with Windows. All of Intel’s high core count CPUs encounter the same issue. Intel, however, has kept its core counts much lower and its price-per-core much higher. The Xeon-W line, which is intended for workstations, scales up to 28 cores in a single socket but offers no dual-socket compatibility. I checked prices at Dell — a dual Xeon Gold 6252 workstation (24C/48T, 2.1GHz base, 3.7GHz Turbo) starts at $10,138. The same system with a Xeon Bronze CPU starts at $1579. That’s an $8559 upgrade fee for two CPUs that offer just 75 percent of the Threadripper 3990X’s core count, at more than twice its base cost.
These price cuts should drive higher core-count CPUs into more professional markets, which will, in turn, encourage Microsoft and Linux developers to better support them.
Finally — because the workstation market doesn’t just respond to core count — we’ll also be examining some performance cases where Cascade Lake remains a better option. Applications that don’t scale particularly well with core count sometimes run significantly better on Intel hardware. I’ll tell you upfront that Cascade Lake does win a few tests against the 3990X. That’s why it’s important to understand the various characteristics of the CPU before buying it.
I regret that I didn’t have the full review finished in time for you to read it this morning. I hope that what I’ve laid out here in my version of a “Coming Soon” gives you something to look forward to — and a fair look at what my thoughts on the CPU are, even if I need a few more days to finish the project.