I have seen many of these posts...and tried so many things.
My hardware:
Some Details and the Initial Problem
Things I have tried
Things I find noteworthy
The Testing
I tried the testing in 2 ways, one with CBP and XMP on, the other without. No change was found in any of the testing.
With all defaults and only removing ZeroRPM and Increasing the fan curve dramatically, the card was able to function much longer before getting this error. However, "much" in this case means it would sometimes get through the 60 second stress test, but never a second time. It still, in most cases would fail and green screen during the first test but with ZeroRPM and no fan curve set, it would fail in the first 20 seconds, without it would usually get passed 30.
Lowering the voltage had no effect. I think I could go down to 980 or somewhere in that range. Tried many different values.
Lowering the GPU had no effect. I tried at 2100, 2000,...,1500. I tried all these values and found nothing remarkable.
The memory had no lowering, and was always at 2000 mhz, the only tuning I could do would be to increase the VRAM, which seemed would not help in this issues so I left the values alone here.
I tried the power settings at different percent increases, up to like 15 I think (which was max) and no change could be seen.
I feel that running the same test and playing the same games all this is happening on while using the 5700 xt removes most of the possibility this is coming from something on my system. For instance, if the wattage draw when running the 6800 xt at a much lower GPU is the same or lower as my 5700, the power supply should not be the issue since the crashing still occurred with no change. Also, since the GPU is not pushing more than the 5700 xt when the 6800 xt is lowered, it should be something the CPU or RAM cannot handle for some reason, and in the same regard, the board as well if the GPU is actively doing less, that is less for the board to handle. Basically, in the testing I am reducing the 6800 xt to something below the 5700 xt and the issues still occur at the same rate as having the 6800 xt on defaults. Of course, that is a little different when turning off ZeroRPM and increasing the fan curve...but only slightly.
Lastly, I decided to make a partition on the WD 1tb drive and try installing windows there and testing this out on a new install. My thoughts are that it would not solve the issue since even when looking through bios settings while the 6800 xt is installed, I am getting resets (though it does not green screen, but I do see some vertical artifacts before it just resets). In the process of installing Windows, the computer reset (I install windows all the time, this was not a typical installation reset, this was not a planned action). Eventually, I got windows installed, and the green screen occurred before I could even install the chipset drivers. Still, pressed on, got everything installed, still could not pass a stress test. So the issue is not windows, and I do not believe the issue is the drivers either. The crashing was so bad, I had to put the 5700 xt back in just to remove the partition and and fix my boot.
Things I know about, but haven't tried
I have a ticket with Gigabyte over this...but it sucks. I had to over spend by hundreds to get this card and if they RMA it and the same crap happens....ungghh....don't even want to think of that.
So does anyone have anything else they think may help in figuring this out? Its a common issue it seems getting the Cache Hierarchy WHEA Logger Event 18 Processor Core APID [n] error. It seems many people have tried many things and there is never a concrete answer. I am of the thinking this card is truly defective. But, at the same time, more minds on a problem are better than one.
Solved! Go to Solution.
Finally got my new card from Gigabyte.
Zero issues.
I have played RD2 for about an hour, FFXV for about 2 hours, and Asgards Wrath VR for about 6 hours. I did the heaven benchmark as much as it would go for about 30 minutes and did the stress test in radeon for 1 minute multiple times. My machine has been on for about 24 hours now.
Previously, none of this, even the machine simply being on would have worked with my 6800xt installed.
The ultimate error given was a cache hierarchy event 18 with apic processors numbers which leads people to believe it is a cpu issue.
For me, this is not the case and I was certain of that prior to sending my 6800xt in because my 5700xt worked fine without these issues on the same machine.
Now my 6800xt also works fine on the same system.
If you experiencing the event 18 errors, there is a good chance they are related to your card/gpu.
Solution: Sent card back to manufacturer where they tested the card, found it had memory issues and power issues, and sent me a new card that works without issue.
Hopefully this helps.
More testing.....
Removed my 6800 xt. With the 5700 xt installed, I ran DDU in safe mode (first option) and then restarted. I used the next to latest WHQL Radeon drivers and let it extract. If I let Radeon continue and choose install driver only, it does not help when I get teh 6800 xt back in. But, if I close the installer, then update the display adapter in the device manager with the path that the Radeon extractor saved the drivers, then turn the machine off and put the 6800 xt back in, then once started do the same with the newet WHQL drivers and then do to device manager and updated the 6800 xt to the newest driver, I am able to sparingly use the card with afterburner loading and setting a much higher fan curve. Without after burner and the much higher fan curve, I am not able to pass any test or bench marks.
I say sparingly because I can pass a 1080p Heaven run for 15 minutes, but as soon as I put it to 1440p (on a card that should be 4k ) it green screens. If I run RDR2 at 1080 p, I was able to run around for a while without a crash, but move that up to 1440, its green screen time. Any VR I try crashes within minutes.
I have removed my m2 NVM drives, let everything cool down, and then installed a Sata HD (yes, HD, still have a bunch of those), popped the CMOS, waited a bit and put it back, started the machine with the 6800xt, and I could not make it through an installed of windows with the machine rebooting. Put the 5700xt, installed windows with no issues.
My hope here was that somehow the m2 NVM drives were someone causing a problem...apparently they do not.
The card still causes reboots even when I am in bios, which having the 5700 xt installed, I do not experience at all.
I am guessing my next test is either to try this other x570 Gigabyte Auros Elite board, which I would rather not open...I would rather sale it. Also, I could take my power supply off this machine and put it in a different one of my computers (which are all intel but older) and see if the crashing occurs. But, if it does not, that does not really tell me a whole lot except that by default the card is not exactly faulty....but it still may be regarding AMD CPUs I guess.
I hate this. If I RMA the card with gigabyte, its either bad or not and they either return a fixed, new, or the same card that they found to be working but still will not work for me. Then if I RMA with newegg, its either to get the full amount I paid if it is defective per their tests, or get what I paid minus restocking which is outrageous at $250 per item. So newegg = money or same card back that is not working for me, gigabyte = fixed card, or same card that is not working for me. If I send it to gigabyte, it would not return in time for me to send it back to newegg.
Anyone else? I see other posts about with the same info, but they do not seem to be doing as much testing to figure this out. I also do not see anyone saying anything about the system restarting even in bios.
Additional Note:
On all these restarts when I am in bios or trying to install windows (I have tried in more ways than I wrote so far), the 6800 xt fans spin at startup, then stop. The back of the card is hot to the touch...not burning, but still hot, and yet the fans never come on.
This is interesting in relation to some other things:
When using adrenaline, if I could get to the manual settings before the green screen happened and turn of ZeroRPM and set the curve higher, the fans start spinning, I do not think I got a green screen until I ran tests that further stressed the card.
If I start with afterburner and have it apply the higher fan curve, I do not get green screen and fans are spinning.
After a green screen, if I touch the card, its hot. Which to me may mean that when I am getting reboots in bios and installing windows, the card's vbios is telling it not to turn the fans on, which after burner overrides if I can get that far and adrenaline overrides if I can get that far.
There is no vbios update for this card per the gigabyte tech support agent I spoke to.
So...if the card is hot, I imagine the junction temps are pretty high. But, without software telling it do so, the fans do not kick on.
I keep leaning more and more to this being a specifically bad card. But if I send it off to gigabyte, I am not sure how thorough they will test it and if they just throw it in a machine, open some crap game, and it works, then they are sending the same card back and its outside my windows for return. I guess I could send a note with it explaining the issue more than there small few sentence RMA textbox for the issue would allow.
I think this is sealing it as a bad card, or bad advice from AMD...not sure which.
I just installed the 6800 xt to a bit older machine, but still one that should be powerful enough:
Intel Core i7 4790 @ 3.6gh
32gb of ram (older DDR3)
Gigabyte h97m-d3h Board (same brand as the card)
Corsair CX 750
So I put the card in. Started up the machine. I had it connected via Display port since I did not have an HDMI cable here at work. I have typically be using HDMI at home.
I went to download the drivers from AMD, black screen. The machine did not reboot....but it did not do anything else either. I turned off the machine, and this card was super hot. The fans never spun. I cannot risk my server at work anymore than what I did already. But I think two machines, both in the realm of being capable of using this card, crashing without even actually stressing the card is a sure fire indication this card is bad. The fans not spinning...maybe that is part of their bios, but I cannot see why they would not spin when the back of the card is so hot.
I will be RMA'ing this thing with Gigabyte and including all of my test. If they come back with, "It works for us", then I am pretty sure we are all being scammed by someone, be that AMD or the card manufactures. I am only semi-serious there.
However, the one thing that is the same among these tests is the power supply wattage being 750 watts. Mine being a corsair hx 750 Platinum and this one at work being a lesser corsair cx 750 bronze I think. But, per everything I have read from official documentation and pcpartpicker (not anecdotal or opinions on forums) 750, especially gold or above, should be enough. And remember, this card is crashing without even doing anything so no where near 300 watts are even being used and when I can get it running for a while on my machine, it went at least 10 minutes on 1080 p. So, are these cards capable of running on 750 watts? I would love to hear from someone actively doing that without any issues. If being generous and I say 80 for the board, 10 for 2 rams sticks, 125 for the CPU, 10 for cpu fan, 10 for 3 case fans, 50 for 2 NVM drives, that is 285. I also tried with one HD instead of NVM which takes less watts. That leaves at min 465 watts. I cannot believe that a non stressed card would fail because of a lack of wattage, and remember those numbers above are a system in stress, and the 285 is likely quite high.
I guess that is it for testing. I will be sending the card to gigabyte and post back when I hear back from them.
Your issue and experience is so much like mine, that I had to steal your formatting for my initial post. Hopefully someone gets to my ticket with Gigabyte some time soon.
Gigabyte tested my card and found 2 issues, one was the memory module and the other I could not understand what the guy said but it sounding like it was related to power. But, apparently the issue was the card itself was bad and they are sending a new one. I anticipate some type of document or explanation at some point that better explains the issues and I will post that here.
Who knows how long that will take, but I am hopeful the issue will be resolved when I get the new one.
I will post back when I get the new one.
How long did it take before Gigabyte responded to your ticket? I opened one on the 26th, but haven't heard anything from them yet.
The RMA came via email in about 2 days. I did call and speak with a tech before hand and explained the issue and they said to RMA it...maybe that sped it up some?
They did say that RMA's can take 3-5 days I think and then shipping can take 2 weeks with 1-2 weeks out from being inspected and tested. Mine seemed to move through the system quite a bit faster, but maybe work within those timelines? They also said those were typical times right now and that depending on the workload it could take longer.
It has been about 18 days since they tested the card and found it had issues. They said they were about 2 weeks out, but I called today and they said they are not sure as it could take longer.
Also, they do not give more detail to the problems they found, so the best detail I am getting is "a bad memory module" and "a power issue on a chip" which is literally all the technicians wrote.
Somewhat irritated that A) I am not going to get more information about what went wrong to help others and B) That I am sure they have sold more of this same card in the time I am waiting to receive mine.
If I owed the bank $1000 and said instead I wanted to buy $1000 worth of peanuts, the bank would not care because I have a debt to them and I would be fined for not honoring my obligation of that debt before doing something else. I feel the same logic applies here, gigabyte should be obligated to withhold a sale in order to remedy their debt to me. I have already paid for their product, and yet, I have no product. What is an acceptable time frame here? Each day or week or month is one day or week or month I do not get enjoy the product I paid for within the applicable time that product is meant to be used. What if takes 6 months...a year... I paid for a card that would be high end for the immediate foreseeable future and if it takes to long, then I am effectively paying for a used-to-be high end card or the least a high end card with a limited time remaining of being high end. I realize that is a bit drastic and probably unlikely, its more likely I will get the card in the next weeks, but I needed to vent on my disappointment.
Update.
So, I sent my card in on 5-19, they got it 5-28. On 6-04 the said they tested it and that I would be getting another card. Then, on 6-19 they changed the "fix date" to 6-19 instead of 6-04.
Its been about 30 days since then...still nothing. I called about 2 weeks ago, they said they had no estimate as they did not have any stock to replace my card with.
I will definitely be posting back when I get my card...but...at this point, I have to say this RMA experience has been a very negative one in terms of the time this is taking.
If anyone else is considering RMA's a 6000 card, get ready for long, long wait.
Finally got my new card from Gigabyte.
Zero issues.
I have played RD2 for about an hour, FFXV for about 2 hours, and Asgards Wrath VR for about 6 hours. I did the heaven benchmark as much as it would go for about 30 minutes and did the stress test in radeon for 1 minute multiple times. My machine has been on for about 24 hours now.
Previously, none of this, even the machine simply being on would have worked with my 6800xt installed.
The ultimate error given was a cache hierarchy event 18 with apic processors numbers which leads people to believe it is a cpu issue.
For me, this is not the case and I was certain of that prior to sending my 6800xt in because my 5700xt worked fine without these issues on the same machine.
Now my 6800xt also works fine on the same system.
If you experiencing the event 18 errors, there is a good chance they are related to your card/gpu.
Solution: Sent card back to manufacturer where they tested the card, found it had memory issues and power issues, and sent me a new card that works without issue.
Hopefully this helps.
CHECK THIS,,,
I found the same, on MSI,ASUS,ASROCk,ZOTAC, ETC
i had issues just like the 1000s of posts I've seen so far.. just re paste and it all goes away.. when u have parts of the chip that HASNT HAD THERMAL PASTE ON IT FROM FACTORY U will have black screens weird power issues based on parts of chip exposed...
here is my XFX RX6800 qICk XFX already SANDED the GPU DIECHIP showing its a MCM already before I sanded the ram chip surface and VRM surface allowing for better thermal contact to metal under the plastic... cut the thermal pads u can save some heat wash by separating the VRM and RAM thermal pads not sharing one BIG piece of thermal pad and put some MX5 on there...
DONT BE SCARED GO SLOW MAKE ROOM AND TAKE UR TIME...
My issue was apparent even on cold starts, even in bios. To me that suggests the problem could not be solved by heat solutions alone. As well, if my product is still in warranty, sanding the dies may void future issues from being fixed under warranty as the manufacture could claim I created the issue when sanding. I could be wrong, I guess that wold be defined in the warranty terms per each manufacture. I would rather exhaust warranty on a new product first anyway.
Though, it did seem to happen worse as more heat was generated, so its possible I guess reapplying thermal pads/paste would have helped.
Giga specifically said a defect in the memory and a power issue were the cause. While its possible it was poorly cast dies and shoddy thermal applications and they are only saying its not to save face, I would like to think with the evidence of my testing along with theirs, heat is probably not the main factor for my issue. Had I took the brand new out-of-the-box card apart, voided my warranty, and the issue was in fact bad memory, I would be out the cost of the card.
Personally, I would not recommend anyone take their card apart without checking their warranty terms first and being reasonably sure sanding and thermal reapplication would help their issue. Being that many post on many sites of this same type of issue have had many solutions from changing the CPU or RAM to changing the PSU or GPU, I think its hard to gather enough evidence that heat solutions are the answer and I would not want to apply that globally to all instances of this problem. Who knows, maybe in the future we all find out its been shoddy thermals the whole time.
After completing the BREAK-IN process "7nm takes 10x to break-in compared to 12nm" also VRM can be made of dense materials such as cobalt and other metals meaning they might have to warm up in order for the physical properties to be correct... THEY WILL BE NOISY>>> THIS IS OK they will warm and settle.
65c to 75c is the perfect temp range for all the 6000 series...
RUN FURMARK "LONG TIME MANY TIMES"
RUN RAYTRACING GAMES "JUST SIT THERE OR MAKE A MACRO TO SHAKE THE CAMERA in the lowest frame rate ZONES"
RUN PHYSICS BENCHMARKS IN CONTINUOUS... until the benchmark stops climbing...
MONITOR TEMPS
and this is what u get to play online with...
Hi
Do you have any references that back up the idea that 65 to 75 is ideal for the 6000 series cards.
I have constant ctds with MSFS and I amn looking for any information that might give me some clue as to what is going on.
At the moment I am running my 6800 at 1800mhz and 940mv and it runs at about 120w and 58degrees.
I thought this might help but it hasn't so far.
Maybe I need to heat it up some more.
If the claim is that a 6800 card should ALWAYS run in the 65 - 75 range, this claim is likely false and we know this as running the same card being cooled with liquid nitrogen at much lower temps is how people get insane benchmarks. So the claim, if stated as a "My system runs my card well in the 65-75 range", then that can be true. Electronics typically run better in cooler conditions than hot conditions because of conduction and heat transfer, which is responsible for slowing down transfers and molecules.
But, what I gather is being said here is the break-in for the thermal pastes and pads. That is a real thing and it helps the spread. The whole point of sanding is to lower any angled imperfections in the head (the die surface) and to create more surface area in the form extremely small yet level scratches on the head. The paste, if applied cool and then ran cool, may never reach a saturation temperature that allows the paste to seep into the head as much as it could or spread on the head as much as it could which results in less surface coverage. Less surface coverage means less heat transfer. Less heat transfer means more heat saturation, which in turn means a slower product.
So in the beginning, its a good idea to use some stress to heat it up a bit, maybe even use more stress after that. Then, turn everything off and let it cool. After a few repetitions of this, the paste should be set and reached its max surface coverage. After this, though, the running temp of a device or GPU or CPU has no specific heat range to run in (it does have a max manufacturer tested heat), generally its the colder the better. But applying paste or pads, and then immediately trying to run a die cool, it could cause less coverage and so could cause more heat saturation. It should also be said that most modern paste do not suffer from the need to be broken in like older pastes did.
That makes sense. Thanks for that.
My card runs very well, fast and cool and the only time I have any problems is when I use MSFS.
It crashes to desktop nearly every time but it is dwm or msfs itself that crashes not usually radeon.
My card idles at just a few watts and in the 30s temperature wise.
I just wish I knew whether the problem is gpu related or MSFS related.
Hi
The idea of breaking in a gpu is new to me. Do you have any sources that I could check this up in more depth.
As I said in my previous question I have been running my 6800 cool. After reading your post I turned it up to 2400mhz and 940mv and backed off on my fan settings. It ran at between 200w and 210w with temps from 60 to 80.
I had a very good flight in MSFS and no ctd which is very unusual for my setup.
What period are you talking about for running in. Perhaps mine hasn't run in at all because I've been keeping it cool with my fan settings.
Any suggestions as to where I might find this sort of info that you regard as reliable.
I have constant green screen crash while using HDMI on my Tuf 6800xt. After swapping it with my old gtx1070 and using also HDMI the green screen crash completely stop. For me all signs point to half baked or bad AMD driver because everything was working fine before.