cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

Acediaxeis
Adept I

Computer freeze into restart.

AMD Ryzen 7 5800X 8-Core Processor

AMD Radeon 6700 XT

It has happened 2 times so far as im aware, though 1 time when i came back to my pc it had already restarted. I don't have any softwares up but it just restarts sometimes, this is a 2 month old PC build, sometimes it makes weird noises as well like this wheezing noise from the fan or the GPU not sure which, and earlier today it made this weird kinda crunchy fan noise, not sure how to explain it.

The error that shows up in event viewer is -

General

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0

The details view of this entry contains further information.

Details -

 Provider
   [ Name]Microsoft-Windows-WHEA-Logger
   [ Guid]{c26c4f3c-3f66-4e99-8f8a-39405cfed220}
  
 EventID18
  
 Version0
  
 Level2
  
 Task0
  
 Opcode0
  
 Keywords0x8000000000000000
  
-TimeCreated
   [ SystemTime]2021-08-07T15:13:04.1501046Z
  
 EventRecordID33681
  
-Correlation
   [ ActivityID]{20dd53db-8371-44a2-86b3-cf18589d57a7}
  
-Execution
   [ ProcessID]3780
   [ ThreadID]4256
  
 ChannelSystem
  
 ComputerDESKTOP-2FBKNCC
  
-Security
   [ UserID]S-1-5-19
-EventData
  ErrorSource3
  ApicId0
  MCABank5
  MciStat0xbea0000000000108
  MciAddr0x1fff8055163dbbe
  MciMisc0xd01a0ffe00000000
  ErrorType9
  TransactionType2
  Participation256
  RequestType0
  MemorIO256
  MemHierarchyLvl0
  Timeout256
  OperationType256
  Channel256
  Length936
  RawData435045521002FFFFFFFF03000100000002000000A8030000370C0F00070815140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB57131FE6FF5E89C91C54CBA8865ABE14913BB42CC8BB19E8BD70102000000000000000000000000000000000000000000000058010000C00000000003000001000000ADCC7698B447DB4BB65E16F193C4F3DB0000000000000000000000000000000001000000000000000000000000000000000000000000000018020000800000000003000000000000B0A03EDC44A19747B95B53FA242B6E1D0000000000000000000000000000000001000000000000000000000000000000000000000000000098020000100100000003000000000000011D1E8AF94257459C33565E5CC3F7E8000000000000000000000000000000000100000000000000000000000000000000000000000000007F010000000000000002010000000000100FA2000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007000000000000000000000000000000100FA200000810000B32D87EFFFB8B170000000000000000000000000000000000000000000000000000000000000000F50157A5EFE3DE43AC72249B573FAD2C03000000000000009F00020600000000BEDB635105F8FF010000000000000000000000000000000000000000000000000200000002000000B1A1BCB29E8BD701000000000000000000000000000000000000000005000000080100000000A0BEBEDB635105F8FF0100000000FE0F1AD0000000000000000000000000B00005000000004D00000000F9010000230000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003B00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

 

I also see a bunch of other errors in there and warnings, 

1 warning is this

An error was detected on device \Device\Harddisk1\DR1 during a paging operation.

Another warning is this

The system failed to flush data to the transaction log. Corruption may occur in VolumeId: C:, DeviceName: \Device\HarddiskVolume3.

Failure status: The specified request is not a valid operation for the target device.

Device GUID: {b9764289-ae1b-6144-ed16-e7f5d682c03f}
Device manufacturer:
Device model: KINGSTON SA2000M81000G
Device revision: S5Z42105
Device serial number: 0026_B768_48DA_8F05.
Bus type: NVMe

And 1 more warning 

The application-specific permission settings do not grant Local Launch permission for the COM Server application with CLSID
Windows.SecurityCenter.SecurityAppBroker
and APPID
Unavailable
to the user NT AUTHORITY\SYSTEM SID (S-1-5-18) from address LocalHost (Using LRPC) running in the application container Unavailable SID (Unavailable). This security permission can be modified using the Component Services administrative tool.

Adapter serial number: 50026B76848DA8F0 _0001

1 Error is this

The LightingService service terminated unexpectedly. It has done this 1 time(s).

another error is this

The SysMain service terminated with the following error:
The parameter is incorrect.

Im not sure if any of these errors or warnings have anything to do with my other issues, im going to call up the people i bought the computer for sometime this week, but right now i kind of want to know if it can be fixed before i do anything else.

 

0 Likes
9 Replies
Gwillakers
Challenger

This is normally an overclocking error.

Examine the APIC id in your WHEA error entry.

Over time, see if the APIC id's are the same(problem with one core) or different(probably problem with memory)   

This can be caused by running your memory too fast (or sometimes too hot)

or it can be run by running your cores too fast.

Back off any overclocks you might be running (core multiplier or Memory multiplier or relax memory timings)

0 Likes

Hello, thanks for the response, im not running any overclocking currently, though i heard there was a setting in the BIOS that could fix it, Core Performance Boost, i think it was name, its still on but idk. 

It just says APIC id 0, i searched this up and from what it sounds like its core 1 its happening to? It has happened twice and both have APIC id 0, it didnt happen in the first 1 and a half month of having the computer, but now it has happened 2 times within the span of a week.

Though I have also heard a solution would be to update your BIOS, but ill keep that as a last resort, since i have not done it before, and i dont wanna mess it up. 

Might ask a proffesional to do it for me if it comes to that. 

0 Likes

This is an issue that many of us are having, with no answers.  In my case and many others its fresh build, new windows default driver, and no overclocking whatsoever and no settings changed.   You can update your bios.  It hasn't helped anyone else that i can find (and I've been crawling the 100s of threads on the internet you can check yourself, its all over reddit, tomshardware, tweaktown, AMD community, also in Gigabytes forums, MSI forums and many others as people often incorrectly blame memory, GPU, PSU, running applications or overclock settings but for most/many of the cases those did not apply at all and even when it did, didn't solve anything.

 

Seems RMA is only answer but one guy claims he RMA'd three times and every 5950X he got had the same problems.  Now in his case we can say its probably not the CPU but honestly the only common thread I see with this error is AMD 5950X.

 

However some people get it on many different CPU ID"s like random ones.  I always <always> get the error on CPU ID 0 or 1.

 

If you can figure out a way to resolve this without RMA so that I don't have to take apart the cooling system and replace the processor, i'll venmo you $100

0 Likes

Well, this issue doesnt happen often for me at all.

It has happened 2 times, the first occurence was 1 and a half months after i got the computer.

The second one was like, a week later. So i would assume that something new is making this freeze crash occur, as otherwise it should have happened sooner.

I have now updated my bios, though some things i noticed in my previous Bios settings were that there was a setting named "overclock" set to on, i set that to auto as well, though i wont be able to replicate the crash since i dont know how. I kind of just have to wait and see if it is fixed or not, because at this point in time i just dont know.

Edit: I also turned off Precision Core Boost in the settings, which i had seen to be a workable solution for some, and i have also seen that the BIOS update was a workable solution for some, if you wish i could try to find the sources of these on a later date, should not be too difficult to find them.

0 Likes

It helps to understand what some of these things are, so you know where to look, and where you will be wasting your time looking.

Machine checks, have been around for ever.  They basically are a way for the CPU chip to tell you it could not do something that was being asked of it.   Some examples:   1. Run the next instruction, but the binary data that is at the next instruction location does not form an instruction (i.e it is garbage)   So the machine throws a machine check of the type "operation exception" . 

#2.  The CPU has a valid instruction that says add fields A and B.   In field A is the number 5 but in field B is the Letter W.   So the CPU throws a Machine check of the type "data exception" 

#3 The CPU has a valid instruction that says Add the number at address x'00000000' to the number at address x'12345678'.  However the program issuing that instruction does not own the page at address x'00000000'.  So the machine throws a machine check exception of the type "Protection" exception.    It doesn't want a virus running in one space to access the data owned by your spreadsheet software.

#4 The CPU has an instruction that references address x"FF2E4BC6"  but that address does not exist (What are you to cheap to buy 4 Gigs of memory? lol, that's a joke son) The Machine will throw an exception of type "Addressing"

******* Now I just gave you a bunch of old time stuff.

Too tell you the truth, I don't know what the blazes a "Cache Hierarchy" exception is.  However, I do know that all the caches in todays computers are physically located on the chip itself.  Level 1, Level 2, and Level 3 (Both instruction and Data cache) all reside in the die under the Heat spreader.

Thus your problem is with the CPU chip itself.   It does not mean it is broken.  It just means the CPU has been told to do something it can not do.   Even more specifically you got a problem with the cache in the CPU.  (You now should be thinking VSoc voltage)

********  Now I need to go off on a tangent and talk about Masters Issuing tasks, and Waiting for Slaves to accomplish task.

There are 3 types of waiting:

#1 Polling:    I the Boss give you a piece of work to do, and I keep asking you if you are done.   Are you done...   Are you done... Are you done... Are you done... Are you done... Are you done... Are you done... Are you done... Are you done...                     This is very easy to program, but very wasteful of the Boss's time.

#2 Multi Tasking and Interrupt handling.   I give you and maybe others like you pieces of work to do.   I go off and do what I want, and when you are finished, you interrupt me, I stop what I'm doing, I take the information from you., you go off and wait for another assignment.  As you have done your task (I now have the Miller contracts), I am now free to re-prioritize the work before me.     This is more complicated to program, but does not waste the Boss's time.

#Contractual Agreement :   Both the CPU and Memory are way too busy to either poll or take interrupts (I know I'm taking liberties here but stay with me).   Suppose you had two people, each in a separate room. They can not see each other. But there is a hallway connecting both rooms.  This hallway, which for all practical purposes can be an electrical bus (just a wire tracing on a Motherboard seen by both CPU and Memory sticks).  There is a guy at the end of the hallway, that is flicking a light switch on at a certain rate. The guy in one room (The CPU) comes to an agreement with a guy in the second room(The memory), that when he wants something from Memory he will give the Memory 5 blinks of the light to get it out of memory and put it into the hall. They also agree that when the CPU wants something stored in the memory room, the Memory manager will have 6 blinks of the light to take it out of the hallway and store it.   Then they tell the guy at the end of the hall how fast to flick the switch.

From what I understand, there isn't really anything you can do to make the Memory manager work faster. It is possible that the Memory manager fetches thing and puts them in the hallway in just 4 blinks instead of 5.   Gee, you guys are wasting a blink of the light every time you want something read.  Lets also say that the memory manager stores things in his room in only 4 blinks as well.  Gee, you are also wasting 2 blinks every time you want something stored.

Remember, the CPU is much faster than the memory sticks.   The CPU waits for memory.   The memory has always been a big bottleneck where the CPU is concerned.  (That's why it is always better to buy a CPU chip with a bigger cache than to waste your money on fancy Memory sticks).    

So if the CPU and the Memory come to a new agreement, that only 4 blinks should be allotted for Reads and Writes, then your CPU will run faster.

However, It might have dawned on you to tell the guy at the end of the hall to switch the light faster.   If you do so, and you keep the Contract of 4 blinks per read and 4 blinks per write, the Memory manager might not have the time to complete his job and you wind up with Garbage.

==============

So ultimately my advice is:

1. Set VSoc to normal and add a very gentle positive differential voltage ( like .006,  or .012)

2.  If you are using XMP or DOCP, consider turning those off, as they affect FCLk and may be contributing to running the internal memory controller faster than it can really work. Remember The Ryzen IMC is spec-d at 3200MHz

PS. You can keep your $100 bucks, I do this for fun.  And have made a good living doing things I love.

0 Likes

Thanks for the detailed info.  However many others have suggested these options.  Also I've never had XMP or DOCP turned on.

I kind of agree with others who say this should be stable with default/latest firmwares without any tweaking.  The components are extremely expensive and many cheaper CPUs do not have these issues at all.   So in the end, something is wrong, somewhere.  These are <BIG> problems. random uncontrolled reboots are absolutely f**ked for top of the line processors.  Unacceptable that its not being addressed.  Before I found tons of info on the internet, I thought maybe it was built incorrectly in some way. 

That being said,  based on your advice I changed VCORE SOC from "Auto" to Normal  and added a dynamic +00625 differential (assumed that's what you meant?)

Now to sit and wait anywhere from 1 to 24 hours for a reboot   if it reboots I will try a higher differential.

kneel420_0-1628861059418.png

Also not sure if it matters but my crashes are always CPU ID 0 or 1, never any of the others

 

 

Also the OP of this thread said it only happened twice.  That's super lucky!!  Its happening for me all the time. Literally this is a brand new system with absolutely nothing changed yet except updated firmware(s) hoping it fixed the issue (it didnt)

kneel420_0-1628861802882.png

 

0 Likes

You did the right thing.

When you have a system, that manages not only to Post, but also get into windows, but still crashes sometimes, then you have a system that is on the edge of a cliff.   If you were standing on the edge looking over, you too might just fall off and crash. However if you take even the slightest step in the right direction, you would be much safer and probably not fall (crash)

This problem is all about if the memory controller can handle the work put in front of it.  Memory frequency, latency settings and even Voltages   (VCore, and VSoc)  can make all the difference.

Even though these processors, Motherboards and BIOS's are new and expensive.  Those manufacturers are pushing you as close to the edge as possible.  They want you to enjoy your experience.  

The problem is, if you move some of these settings in the wrong direction, then you might have a problem where the  system may not post even into BIOS.   I have an ITX board, and getting at the CMOS Clear is problematic.   This resulted in me doing a bios flashback.  (not the most trusted operation)

So the key here is:   ALWAYS TAKE LITTLE STEPS.    Have your system profile saved in case you have to clear CMOS

Know how bad or unsafe you are.

         Crash once in a while

          Crash frequently

          Can't get into windows at all

          Cant get into Bios.

 You job is to make minor adjustments so that you travel up this list, not downward.

0 Likes

Yes, right now im only taken small steps, i updated my bios as one option but thats not really a dangerous thing to do unless the power goes out. Though my motherboard has flashback incase something happens.

So if it crashes again ill try something else, and then wait for another crash, if it comes. we'll see. 

But yeah, havent had any crashes since i updated my bios, though ive only had 2 so that doesnt say much.

0 Likes

A rather long explination but i kind of understood it, being just finished with IT school, i havent learned all these details yet.

But if the freeze happens again, even after the BIOS update and the other things i have done that i mentioned above, i will try these options that you mentioned at the bottom, for now i will wait and see. 

Though im pretty sure it will happen again, as when i do try to fix things like these, it usually just ends with me on a roadblock. So we'll just have to wait and see. 

0 Likes