During testing with Prime95, one core of my 5950X generates an error:
ERROR: Prime95 seems to have stopped with an error!
ERROR: At Core 1 (CPU 2)
ERROR MESSAGE: FATAL ERROR: Rounding was 0.5, expected less than 0.4
ERROR: The last *passed* FFT size before the error was: 12800K
ERROR: Unfortunately FFT size fail detection only works for Smallest or Small FFT sizes.
- stock cpu speeds and stock - no XMP enabled - memory speeds (2133MT/s).
- temperature during testing stays within 60-68° C (280mm AIO)
- Asus X570 Crosshair VIII Wifi with latest bios (3302) and loaded Optimized Defaults
- Corsair Dominator DDR4-3200 64GB (Hynix CJR)
- Corsair RM750x
- Latest updated Windows 10 and latest AMD chipset drivers
- no other software I use does finds any error
In order to solve it I used Curve Optimizer to increase voltage a tad on this specific core: an offset of +5 on this core alone fixed it and Prime95 successfully finishes on Core 1 🙂
Question: There can be other causes then a faulty cpu, for instance memory, psu or mainboard, but after solving it with a cpu core voltage increase, logic tells me this is cpu related, not memory related - I don't see any causality between faulty memory and upping the voltage of one cpu core only. Can I assume that this cpu is not 100% okay or should I do further investigations to rule out other components (ideally test it on another motherboard of course, but I don't have many unused ones lying around. Zero to be exactly). And if this is a cpu fault, will this manual fix - the +5 Curve Optimizer on this very one core - be a solid fix for future usage? I'm asking especially since the impacted core is also my fastest core (yellow star in AMD Ryzen). Boost clocks don't seem to be impacted by this +5 fix however: 5.050MHz (effective clock a bit over 5.000) before and after the +5 CO setting.
This all under the assumption that a cpu at stock settings should not require manually fiddling around with core voltages to get it stable of course.
Edit: I did run the Windows memory diagnostics tool - mdsched: no errors.
The only way to test memory in software is over a minimum of 24 hours with Memtest86.
While it's possible the CPU is defective, it's much more likely that you have marginal RAM.
Just to make sure I'll run a Memtest86 this night, but I would expect that if the RAM was at fault, the Prime95 error would occur randomly at all cpu cores and not only at one and the same core consistently - data in RAM is either corrupt or not corrupt, it is not in such a way 'semi' corrupt that only and always the same core is impacted and the other 15 cores aren't. Statistically highly improbable I'd think.
If it's a highly predictable workload, in terms of memory size, then it's quite possible for the same region of memory to end up on the same core multiple times.
I agree it seems suspicious, but the likelihood of faulty RAM is roughly an order of magnitude higher than that of a faulty CPU.
All that said, it's also possible your sample just failed to be binned correctly, and that core needs more voltage than it should. You could RMA the processor, but if your workaround is indeed properly stable, you're better off sticking with that, given the stock situation for those processors.
MemTest86 was a PASS (even overclocked from 3200 MT/s to 3600 MT/s). That's good 🙂
So, it seems AMD didn't determine the electrical characteristics of the chip correctly during the testing/binning phase, or incorrectly fused these characteristics data into the chip. Or the bios (AGESA) is somehow making a mess of it, or it is a combination of both. I can't think of another cause.
I'll do some further boost behavior testing and will wait for the next AGESA version to see if this issue remains. If the +5 Curve Optimizer setting won't effect performance and temperature I can live with it (RMA is a bit of a hassle), but a slight feeling of disappointment will probably remain in the back of my head 😕