This is a copy of a comment I made on the r/AMD support megathread.
Summary of the grievance: Sometimes games or other programs will crash due to a failure to allocate memory. This happens at around 15GB of GPU memory usage (of the 20GB in total) on my AM5 & RX 7900 XT system.
The questionable hardware: https://ca.pcpartpicker.com/b/9pz7YJ (Windows 11)
I can reproduce this pretty easily by starting up something like "(GL) FurMark-Donut-5200MB" twice. At that point, I'm using 12GB GPU memory, (system usage is 16GB with 14.6GB available). If I try to start up a third one, Kombustor will fail with std::bad_allocation or something. There is available system memory and GPU memory, but the allocation fails. Or, instead of starting a third Kombustor, allocating system memory (less than half of reported available) as part of a memory test (like TechPowerUp MemTest64 or HCI MemTest) fails.
I did that on stock hardware settings. No CPU or GPU overclock and DDR5 at stock 4800mhz or whatever with no profile enabled. I had one monitor connected to the machine through the GPU under test. No variable refresh rate. The machine demon lingers in spite of those efforts.
I can allocate all 32GB of system memory when GPU memory usage is low. At that point windows starts using the page file. That's what I expect it to do normally, even when the GPU is in use.
I've had this problem since I had this machine last year in March.
I had believed GPU memory allocations were failing at higher GPU memory utilization rates. But instead it seems system allocations are. I can run OCCT's VRAM stability test and and allocate 24000MB of VRAM. Windows reports 19.8/20.0 GB Dedicated GPU memory and 6.1/15.6 GB Shared GPU memory. Windows will report plenty of system memory available in those conditions, but I can't use half of it. I can get up to about 14200MB (44.5% of total reported by OCCT) or 14400MB before OCCT crashes.
This doesn't seem like an issue with the any particular part except maybe a goofy infinity fabric thing? I don't really know where to go next with this.
Also tried using PCIe 3.0 instead of 4.0 because riser and that didn't help.
Thoughts?
One to way to see if the GPU card is defective is by installing in another computer and see if it crashes in that PC. If it doesn't that would indicate some hardware incompatibility with the GPU card and your PC. If it does crash in another PC I suggest you open a Manufacturer's Warranty Ticket to see if it needs to be RMAed to be checked, repaired, replaced or get a refund.
Also remember you have 2 GPUs in your PC, The 7500 IGPU and the RX 7900XT. So some Windows RAM will be assigned to your IGPU depending on what amount you have allocated in BIOS.
Open a MSI SUPPORT WARRANTY Ticket and see what they suggest.
Since you are using such a small enclosure it is possible either or both the CPU and GPU are overheating causing those errors due to lack of proper air circulation inside the PC.
Also is the 7900XT Hot spot going above 110c during your testing? If it is then it is overheating and throttling.
Thanks for your feedback, do you suspect a defective GPU has any bearing on not being able to allocate system RAM?
I originally went to open a support ticket with MSI about the GPU but did a bit more testing and found that I can allocate all VRAM with OCCT. So that suggests to me that GPU itself is fine. Similarly, I can allocate all system RAM when tested in isolation. It's only the combination of the two where I have a lot of problems. So I'm not sure MSI would be convinced of a faulty GPU.
> Also remember you have 2 GPUs in your PC, The 7500 IGPU and the RX 7900XT. So some Windows RAM will be assigned to your IGPU depending on what amount you have allocated in BIOS.
Windows reports 512 MB dedicated GPU memory for the integrated GPU.
> Also is the 7900XT Hot spot going above 110c during your testing? If it is then it is overheating and throttling.
In my tests the hotspot is ~80 degrees C; I don't think I've ever seen the hotspot reach 85 C. MSI's cooling on the GPU is robust and the ITX case is thoughtfully designed. It's a small volume, but the sandwich design means hot air exhaust from the GPU goes out the side of the case instead of circulating inside the case.
If I were to open a support ticket with a hardware vendor, I'm not even sure which one it would be. At this point it seems like possibly an issue with the CPU's memory controller or infinity fabric? But that's beyond my experience.
To tell you the truth, I would just install your GPU Card in another PC and see if it does the same thing. If it does then you probably have a defective GPU card but if it doesn't then it is some hardware, either the CPU or RAM that is causing problems with the GPU card.
Since your GPU card failed with more than one Stress testing program to me does seem to indicate a defective GPU card that under certain conditions it fails.
See if you can take it to a computer shop and have them run the same test you run and see what happens. The only problem is they might charge you a fee for that.
MSI Support might give you some insight whether they believe it is the GPU card or not. No harm in opening a MSI Support ticket.
Also open a AMD SUPPORT TICKET and see if they believe it might be a CPU issue since the GPU, CPU and System RAM work together to render Video output from here: https://www.amd.com/en/forms/contact-us/support.html
NOTE: I am a big fan of OCCT and constantly recommend this Stress testing software. But OCCT checks the GPU vRAM for errors but it might be a different circuit in the GPU connected to the vRAM circuit that is having issues. just guessing.
> Since your GPU card failed with more than one Stress testing program to me does seem to indicate a defective GPU card that under certain conditions it fails.
To clarify, I don't think the GPU failed, I think allocating system RAM failed when the GPU was under VRAM load. It happens running games that allocate memory for large textures. If a lot of VRAM is in use, a game may fail to allocate memory for a texture and crash. However, I'm reasonably sure that memory is being allocated from system RAM at that point, the failure is happening before the texture is copied to VRAM. Even the Furmark tests are only failing because they use a lot of system RAM in conjunction with VRAM.
In any case, you're probably right about asking MSI support. And I should check with AMD as well as you suggested. I imagine the first thing MSI will want is reproducing the error without a PCIe riser which will require disassembling my PC. Since that's a lot of work, I wanted to exhaust whatever options I had that didn't require disassembly.
Thanks for your advice.
Sorry I was misreading your previous replies about the GPU vRAM.
Is you RAM listed for on your Motherboard's QVL List for your Processor or is it listed on the Manufacturer's RAM site as being compatible with your motherboard?
Maybe it is a compatibility issue with your System RAM. Try using just one Stick of RAM Memory on A2 DIMM Slot and see what happens. Sometime with 2 or more sticks if RAM is not 100% compatible it will cause issues but not with one stick of RAM.
Maybe the RAM needs to be configured in BIOS to make it more stable during loads. Just throwing out possible scenarios. Not an expert when I comes to configuring RAM in BIOS though.