Hi,
I am trying rocm + pytorch on my computer. I found the rocm + pytorch docker just not works. It can not access GPU at all.
I suspect that it is the driver problem, so I update the amdgpu-dkms in my Ubuntu 22.04.2. The problem is that the integrated amdgpu driver work fine, but once I update it, the new driver not works at all. I can not even boot the ubuntu. The error is like: [drm:amdgpu_ras_eeprom_init [amdgpu]] *ERROR Failed to read EEPROM table header, ret:-5
How can I fix it? Maybe I shoud use the integrated version amd driver, and downgrade the docker's rocm as much as possible?
Thanks in advance!