Hello I installed the latest drivers on the Ubuntu 20.04 and I got following error that not recognise my GPUs. Can you help to solve this?
~/Downloads/amdgpu-pro-20.45-1188099-ubuntu-20.04$ clinfo
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (3188.4)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Host timer resolution 1ns
Platform Extensions function suffix AMD
Platform Name AMD Accelerated Parallel Processing
Number of devices 0
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform
Here are my results after some fights with the system :-). Now the only error is as follows. Can you help? Thanks!
Memory access fault by GPU node-1 (Agent handle: 0x557045d4c970) on address 0x7f670dd88000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
"Memory access fault by GPU node-1 (Agent handle: 0x557045d4c970) on address 0x7f670dd88000. Reason: Page not present or supervisor privilege. Aborted (core dumped)"
I have similar error with RX 5700 XT (listed below), when I try to bake textures in Blender. As far as I know, that error is related to the drivers and should be fixed in the near future (at least the one that shows up at my place).
Memory access fault by GPU node-1 (Agent handle: 0x7f9faac09b00) on address 0x7f9e1782c000. Reason: Page not present or supervisor privilege. ./apps.sh: line 34: 84735 Aborted (core dumped)
I have a few useful commands for this... many hours of figured them out... So, maybe someone else time will be saved. Thanks!
P.S. I would like to avoid special character breaks, that is why it is on the image, sorry for the inconvenience ;-). Go, Go, AMD! :D.
Btw, I spent almost an entire night recently making a working Azure VM with Mi25 GPU. I was trying to setup ROCm and guess one on the same Ubuntu 20.04 (I updated from 18.04), and it was not working at all on 20.45 drivers for Linux and/or ROCm... AMD, come on! Please at least check your PRO hardware with drivers.. I had no time, so I switch to "green company" at this moment... Please fix ROCm and AMDGPU as fast as possible... I invest in 4 x Radeon RX 6900 XT from eBay, so it no fun so far. I am not a gamer and not like Windows, sorry to say... I am doing professional CNN / DNN / AI /ML... please consider my frustration;-/.
I read this topic by coincidence - it seems you have built a nice rig.
As I am in related line of work (though this time round without the GPU work) I would also recommend you to go for the TRX40's great PCI v.4 support that - due to its superb throughput - has already saved me countless hours of transferring data to/from the CPU.
See the speeds with 980 PRO NVMe on RAID 0 here: https://pcpartpicker.com/b/dT7TwP
Also consider very low latency memory. Mine are far from perfect, a kind of an engineering trade-off really, until memories at over 4600 MHz CAS 17 (or lower) become available at a sensible price. The same way I chose not to get yet a decent GPU.
Very interesting, but for my case, I forgot does to show what is logged. Maybe that would be helpful for AMD staff to track it down :D.
It almost works..! I found a solution thanks to the dmesg and web search! :D. wonder if open discussions are tracked by AMD?
to solved I tried... to add in: /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash drm.rnodes=1 radeon.si_support=0 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 intel_iommu=off"
sudo update-grub2
sudo reboot
thanks!
"GRUB_CMDLINE_LINUX_DEFAULT="quiet splash drm.rnodes=1 radeon.si_support=0 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 intel_iommu=off""
Did you get rid of the "Memory access fault by GPU node-1" with this solution?
Unfortunately NOT... yet, I hope, even if hope is not a strategy!