Hello
I've noticed that my RX 7900 XTX's performance drops after a few minutes
I have a rather simple Vulkan program i've written that loads PNG/JPG textures and generates mipmaps using blitting. Before performance drops it loads and generates mipmaps for ~30 textures in ~60 ms. After performance drops it takes ~1000 ms (almost 20 times slower!).
Another program that uses compute shader to outline model drops fps from ~9000 to ~3000.
From my observation this performance drops mostly affect programs that constantly write to GPU memory. At first i thought that maybe different regions of memory have different speed or something, but that is not the case, since i can load my GPU memory to the brim rather fast before the performance drops.
VK implementations i tried: AMD open-source driver 2023.Q2.2 (LLPC), radv Mesa 23.0.3. Both are affected by performance drop.
After Logging Out/Logging In (reset Wayland/X session) performance goes back to normal for a few minutes again.
What can be the cause of this? Driver? Kernel?
Everything seems to be OK in Windows.
Some info:
CPU: 7800X3D
OS: Manjaro Linux (KDE) (6.3.5-2-MANJARO)
Solved! Go to Solution.
I think that i found a solution. Just before moving my Linux partition to /dev/null i decided to give it one last try. There is a list of available amdgpu module parameters here:
https://docs.kernel.org/gpu/amdgpu/module-parameters.html
I skimmed through this list and added to my kernel command line the ones i found interesting. Here is what i added:
amdgpu.msi=0 amdgpu.aspm=0 amdgpu.runpm=0 amdgpu.bapm=0 amdgpu.vm_update_mode=0 amdgpu.exp_hw_support=1 amdgpu.sched_jobs=64 amdgpu.sched_hw_submission=4 amdgpu.lbpw=0 amdgpu.mes=1 amdgpu.mes_kiq=1 amdgpu.sched_policy=1 amdgpu.ignore_crat=1 amdgpu.no_system_mem_limit amdgpu.smu_pptable_id=0
The problem disappeared! Not only that. I have 5-10% performance boost now. Maximizing/minimizing windows animation was jerky as hell (which i thought was a Wayland "feature") and it is very smooth now. I think even the coil whining is reduced now, but i'm not sure.
My first impression is like yours, it appears to be a memory problem that reaches the max and then takes significant hit in performance.
Is there any way that you can perhaps create a cache on an NVMe drive and increase the memory capacity?
I don't think that it is a memory problem. At lease not a hardware one. Since everything is OK in Windows. And besides, even in Linux after the "fresh start", I can almost instantly fill my graphics card with ~24 Gbytes of textures using aforementioned VK program. Needless to say that after the bug kick in it takes centuries to load those textures.
Well, just in case someone is curious. I've "played" with amdgpu_stress utility and here's what i got. Seems like reads/writes from/to GTT memory are the culprits here.
Here's the output of "amdgpu_stress -b g 512m -b v 512m -c 1 2 512m 4" (write 512 mbytes from GTT to VRAM 4 times) command before this nasty bug kicks in:
Allocated BO number 0 at 0x100000000, domain 0x2, size 2097152
Allocated BO number 1 at 0x100200000, domain 0x2, size 536870912
Allocated BO number 2 at 0x120200000, domain 0x4, size 536870912
Submitted 4 IBs to copy from 1(100200000) to 2(120200000) 536870912 bytes took 77083 usec
Here's the output of the same command after the bug kick in:
Allocated BO number 0 at 0x100000000, domain 0x2, size 2097152
Allocated BO number 1 at 0x100200000, domain 0x2, size 536870912
Allocated BO number 2 at 0x120200000, domain 0x4, size 536870912
Submitted 4 IBs to copy from 1(100200000) to 2(120200000) 536870912 bytes took 9644378 usec
So, basically ~125 times slower...
I've made a lot of RAM tests (allocating, writing random data, copying, deallocating) but RAM performs as extpected. Writing from VRAM to VRAM performs as expected. Seems like only GTT<->VRAM is affected.
Disabling AMD Smart Access Memory or reducing GTT size from ~30GB(default=1/2RAM) to 3GB(min) didn't help. So i guess i'm moving to Windows for good...
I think that i found a solution. Just before moving my Linux partition to /dev/null i decided to give it one last try. There is a list of available amdgpu module parameters here:
https://docs.kernel.org/gpu/amdgpu/module-parameters.html
I skimmed through this list and added to my kernel command line the ones i found interesting. Here is what i added:
amdgpu.msi=0 amdgpu.aspm=0 amdgpu.runpm=0 amdgpu.bapm=0 amdgpu.vm_update_mode=0 amdgpu.exp_hw_support=1 amdgpu.sched_jobs=64 amdgpu.sched_hw_submission=4 amdgpu.lbpw=0 amdgpu.mes=1 amdgpu.mes_kiq=1 amdgpu.sched_policy=1 amdgpu.ignore_crat=1 amdgpu.no_system_mem_limit amdgpu.smu_pptable_id=0
The problem disappeared! Not only that. I have 5-10% performance boost now. Maximizing/minimizing windows animation was jerky as hell (which i thought was a Wayland "feature") and it is very smooth now. I think even the coil whining is reduced now, but i'm not sure.
That is great news. Just one question though. I notice that your using the DEV version and am curious if other versions such as Red Hat are having these issues.
I'm pretty sure that this is a driver (xf86-video-amdgpu) issue, since changing driver parameters fixes the issue. I'm using xf86-video-amdgpu 23.0.0 from Manjaro official repository, I don't know what version of driver is installed in Red Hat by default so i cannot be sure if Red Hat is affected.
part of the issue is when you launch a max quality settings vulkan game title like say "strange brigade" 1080P and see your FPS in the 4000 and it stays that way for about 60 seconds of each map then it caps it back down to crappy lousy not a computer intel nvidia like FPS. Well you need to ensure ryzen balanced power plan properly manually loaded from the extracted drivers and chipset software and install every chipset driver manually or run the setup after the setup has completed in chipset IO packages folder after it extracted.
You will want to specify like a adaptivetarget99999999FPS or something similar and do use frameskip as nvidia and intel and microsoft load in fake overlays and bloat and add a trillion FPS that are not frames or false frames. use add m.GPU add m.CPU add m.usb add m.hdmi and whatever else you can think of to pretend to pretend to multicore coz intel and nvidia are NOT a computer at all. enable PBO but its often your bios or OS having some **bleep**ty fake ryzen master nonsense loaded in somewhere in task scheduler no clue where to look for linux. on windows u have to monstrously fill the directx12 featurelevel with billions of FF values.. i dunno how to even begin to pretend to fix linux.. But try specify CAPABILITY VULKAN 9999 or 2000 or use wildcards then x99999999 like several of those.. maybe use words like quantuminfinity and be sure to use quantum bit depth.. i dont know the correct amount of hybridrealityzenemulation to use or countless other things. my custom guesses android/windows config file thats a text file renamed to a .ini should work for linux just fine. Give that a go. Just be sure to never use anything ECO or power saving on a gaming system. those are for like cheap hand held smart watches or whatever that dont need full PCI express power in the bus or RAM.
bit like previous.rar ~ pixeldrain
i use about 12 copies of the ini file in linux.. windows looks better as i double up in registry and such.. i have to run the dxfeaturelevel reg key each reboot the others just the once? i do sorta rituals each boot and game load to pretend to computer coz i have a REAL AMD computer.. not a fake not math not computer intel and nvidia trash. Using a .ini file on linux might go into some etc or conf or config folder i cant recall where to place it to have system recognise it. But EVERY OS in the universe requires an INTIALIZATION file for boot/bios/display/settings and resolution or other things like which drivers to load and use that can be manually specified. Remember in DOS days when you had to type them up for everything to specify how much RAM or extended or base memory an app had or to load sound card drivers and configure them and what ports to use and such? that has never ever changed and just has some lame generic defaults plugged in somewhere. The thing is!~~~ all computers do maths which isnt a fixed value but can be freely edited or changed in realtime as the app is running change the maths values and the whole computer changes. you can take a bunch of maths constants and functions and make them into a button. But a calculator has a fixed number key value like 9 and doesnt let you adjust it as it does the maths and it is not INFINITE (algebra x could be an infinite number of possible things) and its not eternal (not computers cant infinity and eternity and do complex maths outside of time and space instantly or faster than reality with negative latency). All computers are better than analog or at least analog realtime in sync with reality and can do all the maths there ever was and ever will be which is kinda what the big deal about the ryzen ZEN core is. So if your config files dont work with your hardware they may be able to work 'anywhere' or something or a setting is blocking/disabling it where u put it.. or you didnt buy a computer and its the cheapest fakest calculator or digital alarm clock posing as one like fakes such as intel/nvidia who have no intel and no video for realz.
The performance drop you are experiencing with your RX 7900 XTX GPU in Vulkan programs could be caused by several factors. Here are a few possibilities to consider:
GPU throttling: High GPU temperatures can lead to throttling, where the GPU reduces its clock speed to prevent overheating. Monitor the GPU temperature during operation using tools like "radeon-profile" or "radeon-top" and check if the temperature rises significantly before the performance drop occurs. Ensure proper cooling and ventilation for your system.
Power management: Check your power management settings and make sure your GPU is not being limited by power-saving features. Adjust the power profile to prioritize performance rather than power efficiency.
Driver issues: Drivers play a crucial role in GPU performance. Ensure that you are using the latest stable drivers for your RX 7900 XTX GPU. Consider checking for any driver updates or patches that address performance-related issues specifically.
Kernel issues: Occasionally, certain kernel versions may have compatibility or performance issues with specific hardware. Verify that you are running the latest stable kernel version and check if there are any known issues reported with your GPU model.
Resource management: Insufficient memory allocation or resource management within your Vulkan programs could potentially impact performance. Make sure you are properly managing resources, including buffers and textures, and avoiding excessive memory fragmentation or leaks.
Wayland/X11 compatibility: The performance drop occurring specifically after logging out/logging in (resetting the session) might indicate a compatibility issue with your GPU and the Wayland or X11 display server. Consider checking for any relevant bug reports or compatibility issues between your GPU, Vulkan, and the display server you are using.
Given that you mentioned Windows operates normally, it suggests that the issue could be related to the driver, kernel, or compatibility with the Linux environment you are using. Monitoring temperature, updating drivers and kernel, and ensuring proper resource management can help diagnose and potentially address the performance drop. If the issue persists, reaching out to the AMD community or support channels might provide more specific guidance for your GPU model and Vulkan programming scenario.