I divided this post into two parts to avoid confusion. The main one and most important is the second (PART II)
I built a system with 8x(RX 6600XT) running Ubuntu. It was really easy to install the driver and run the cards all at once. But when I wanted to overclock them I faced so many many issues. And the biggest was setting a static speed for the fans using the terminal. I mean all the articles I found were somewhat outdated (archlinux, dri.freedesktop just to name a few). It took me a while to figure out how to overclock the GPUs on Ubuntu. Everything just won't work. Even the overclocking tools listed on archlinux wont do the memory overclock. Thankfully though I was able to figure that out and overclock the memory and core of each GPU individually using the commandline. But surprisingly I was unable to modify the fan speed file. Also, it is a painful process to redo it after every reboot. So I tried writing a Bash script but that failed to modify the overclocking files (I made sure it has root privileges). Right now I overclock memory and core using the terminal. But the fans I kind of gave up on trying to do do it using the commandline, so I use CoreCtrl for that specific task.
Now my questions for this part of the post are:
Q1. How can I manually adjust the speed of the fans using the commandline? (As I said most info is outdated)
Q2. Why possibly would my Bash script fails to modify the memory and core overclock files? (I have done this with my other system which run nvidia cards. I just use the same commands to overclock them in the Bash file and just run it!
PART II (The Killer Issue)
As I explained I eventually got it to work using the methods above. But after a while I rebooted the machine and started doing my routine after a reboot. Turn on CoreCtrl and set the fan speed for each of the 8 GPUs and then jump to the terminal and change the overclock values for each one. However, when I finished overclocking I realized that openCL cant find the devices anymore. So after checking BIOS and Driver I tried troubleshooting the hardware. And there where I ended up with a black screen. Now, I cannot access the GUI anymore. Even when I use only one GPU. I believe it is got to do with the configuration file. But I just cannot figure it out. With nvidia this issue arises (black screen after logo) when changing the pcie slot or even plugging the hdmi cable to a different GPU. And it can be simply solved by calling nvidia-xconfig and have it regenerate the xorg configuration file for the connected cards.
So my question are:
Q3. How can I reset the configuration file after changing the number of GPUs used or the slot for the GPU?
Q4. Is there a way to generate that using the amdgpu driver? If not do you provide a tool that does that? Because I really cannot figure out how to write it myself even though I read the xorg syntax many times!