Hi,
I am using the newest driver on my Ubuntu 20.04.
However, when I run opencl and pytoch with the simplest command (clinfo and .cuda()) the memory access just failed.
The errors are like below, the main problem is memory access to my Radeon VII.
OpenCL:
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
ERROR: clCreateKernel(-6)
Pytorch:
torch.cuda.set_device(2)
print(f"running with device: {torch.cuda.get_device_name(torch.cuda.current_device())}")
\\ return running with device: AMD Radeon VII
a = torch.rand((1,1)).float()
a.to(torch.device('cuda'))
\\ error messeage: Memory access fault by GPU node-10 (Agent handle: 0x7538750) on address (nil). Reason: Page not present or
\\ supervisor privilege.
\\ Aborted (core dumped)
\\ sometimes, it return: Segmentation fault (core dumped)
I have try many things, like flashing many versions of vBIOS (even proVII), Try many driver version, and even switch between many ubuntu version.
I do not know how to solve it, maybe I should use very old driver? like driver 3 years ago?
Can anyone help me? Thanks a lot.