TLDR; I've done everything short of buying a new CPU/motherboard and still haven't pinned down the problem, but I've isolated it to the CPU or motherboard... unsure of where to take it from here.
Motherboard: ASUS Crosshair V Formula-Z
Motherboard Revision : 1.xx
Motherboard BIOS Revision : 2101
CPU: AMD FX-9370 (220W)
Voltage: 1.45V (Tested Stable at 1.31V)
Cooler: CoolerMaster Hyper 212Evo (formerly Corsair H80i)
RAM: 32GB (4x8GB) AMD Radeon Gamer Series 2133MHz
Configuration: AMP/XMP Profile 1
GPU: ASUS R9-290X-DC2OC-4GB
GPU Firmware version: 1.2
GPU Driver version: Catalyst 15.7
SSD: Samsung 840 PRO 256GB (Firmware: DXM06B0Q)
HDD1: Hitachi 2TB
HDD2: WD Black 4TB
PSU: SeaSonic 1050W (Formerly Antec TP750
OS: Windows 8.1 64bit
Display configuration: 1920x1200 (Main), 1280x1024 (Secondary)
The problem - intermittent hangs in the system - started in late march, and was accompanied by a string of particularly undesired events which included unexpected power outage and CPU high temps due to underperforming Corsair H80i (this was quickly resolved)
Testing for software fault: as usual, I took the problem as a sign of silent corruption via soft memory errors and went about my usual routine of reformatting my OS drive (SSD). The problem lingered, however, even after completely reinstalling all up to date drivers (I keep all drivers up to date anyways, but fresh installations help) Patience for newer versions was required at this point. Note: Catalyst 14.12 and 15.4beta were both stable prior to problem, neither would resolve issue.
Testing for driver/firmware updates as solution: Unfortunately, though, during this initial time of diagnostics, my system encountered a hard freeze while I was performing a firmware update on my GPU. I RMAed the GPU and installed my older HD5870 to use in the meantime. At first, the problem appeared to go away, but eventually reappeared as I increased the load on the system.
BIOS was already up to date.
Updated SSD firmware - no improvement.
Testing clock configuration: At this time, I caught myself up on all information regarding overclocking theory in relation to AM3+ CPUs. I spent several days adjusting the CPU clock settings and stress testing via combination of Prime95, ROG RealBench, 3DMark, Unigen Heaven, and AMD Overdrive. During this time, I discovered that the CPU remained stable at 1.31V minimum at stock clock settings, and did not exceed 63°C under full load. Adjusted voltage to 1.36V for headroom and continued.
RAM settings adjusted to AMP #1 after CPU configured.
Somewhere after this setup, the problem was continuing, so BIOS was defaulted and RAM set to AMP. Noted that CPU Voltage defaulted to 1.5V.
RMA: GPU returned from RMA; new GPU exacerbated the problem. I contacted ASUS, who assumed faulty replacement unit and replaced that one. Situation remained. Proceeded to hardware diagnostics.
Cooling: (Note: ASUS AISuiteII's ProbeII was reporting alarming voltage fluctuations which coincided with system hangs, so PSU was considered as cause. These alerts continued even when Corsair Link or other hardware monitoring program was not installed) Temperatures were not notably high, given the amount of attention on the system internals at the time. The Corsair H80i exhibited an inability to provide sufficient cooling, which was compounded by long-term use (3+months) with dust. Resolved to replace Case, CPU Cooler, and PSU with superior models. Antec case replaced with Thermaltake Core v71, Corsair H80i replaced with CM Hyper212Evo, Antec TP750 replaced with SeaSonic 1050W. ProbeII alerts subsided. System hangs continued.
Situational understanding: Time was taken to allow for driver updates to come out. During this time, the problem was narrowed down to coincide with video playback - browser (HTML5, Flash, Silverlight, etc.), VLC, media player, in-game (any game that had video playback), Xbox Video, Windows Movie Maker, Raptr, Kodi (XBMC), VideoPad Editor, and so on. Hardware acceleration did not matter. Sound output device did not matter. Catalyst Control Center settings did not matter.
Catalyst 15.7 released - problem worsened to occur every 30sec to 5min during video playback now accompanied with "Display Driver has crashed and recovered successfully."
Tried installing Win7 - no improvement. Win10 Technical Preview - no improvement.
Hardware isolation: Installed OS on different HDD - no improvement. Individually tested each DIMM with Memtest86+ - all passed 24hr testing with zero errors. Testing with GPU already performed during RMA. Swapped CPU with only hand: Phenom II 945 (95W) - problem resolved.
Obviously I do not intend to run my system on such an outdated processor; I had to reduce my RAM frequency to 1333MHz, and this would be ignoring the problem, not solving it.
I have come to the following conclusion: either the CPU is going bad, or the motherboard is failing to handle a high-powered CPU (aka the motherboard is bad). Either way, they both passed stress tests and benchmarks with flying colors as recently as 3 days ago - a characteristic which has made this problem most especially difficult to diagnose.
I do not have any spare high power CPUs on hand, nor do I have any spare motherboards capable of testing the FX-9370. I'm also resolved to wait until Zen comes out before buying any new CPUs, not that I can even afford to right now - an emergency expense (one of my cats died, but not before requiring lots of medical attention) ate all of my funds I had set aside for upcoming upgrades.
Before I pin down a definitive diagnosis, I need to be able to isolate the problem to one of these two devices.
Finally, the question: Given the explanation of events, how should I go about this?
RMA the motherboard?
Eat the loss of the $220 CPU and put up with the PhenomII 945 until Zen?
Something I've overlooked?
This is all from the log I kept, which may be incomplete - it did take 4 months to get to this point. I tried to include all relevant information, to avoid unnecessary questions.
I expect this won't be that helpful, but:
1) Make sure you have all the auxiliary power connectors populated - 8-pin, 4-pin, and peripheral (often called Molex).
2) Underclock the memory. Run it at DDR3-1600 speeds to see if it makes a difference.
3) Underclock the CPU. If it's a faulty CPU, that might work around it. If it's the motherboard unable to handle the CPU power draw, same result.
Thank you for replying. Unfortunately, all of the suggestions have been tried multiple times during these last 4 months.
1) the problem existed with two separate power supplies, both 80+ cert from trusted manufacturers. As for current PSU, it is modular, so all connections are populated.
As far as connectors on the motherboard, yes, I've always populated every power plug... Even the Molex plug above the pcie #1 slot connector (I did own an ASUS A8N-SLI Deluxe long ago)
2) This has been thoroughly tested. Without AMP profile or custom settings, the AMD Gamer Series 2133MHz defaults to 1600MHz on this board.
3) This was also tried during the time that I was testing the CPU for stability at non-default clocks. I had underclocked it significantly in order to test the minimum voltage. I would link the guide I had been reading (it was on overclock.net) but my system is yet again reformatting and I am posting with my phone.
Furthering that point, last night I was toying with my Phenom II clock speeds and managed to bring it up to 3.5GHz stable (It was late and I wasn't keen on really pushing it, the multiplier is locked on the 945, after all.) This is far beyond the lowest clock speed I lowered my 9370 to, when it still exhibited the hangs. (I believe it was down below 3GHz)