cancel
Showing results for 
Search instead for 
Did you mean: 

PC Graphics

chrirocca
Journeyman III

RCCL on PCIe

Hello,

I have access to a pair of MI100 connected only with PCIe.
I used ROCm docker image based on ubuntu 22.04 as a base, in which I installed RCCL.
I installed MPI and after that I cloned RCCL test repository to try the connection between the GPUs. After installing, when I try to run the examples shown in the usage of the repository on 2 GPUs, the execution falls into a deadlock, with maximum usage of VRAM and 0% of the cores. I tried many versions of ROCm and RCCL, worried about some kind of bug in the latest versions, but this happens every time.
I didn't find any information about this, so I wanted to ask, what could be the problem? For instance, i tried to do the same procedure for NVidia, and the examples run without any problems on my 2 V100 connected to PCIe. 
From what i understand, RCCL should support PCIe connection, so i don't think this is the problem, and apparently this happens with the latest 2 versions of ROCm/RCCL. Maybe do you have any suggestions?




0 Likes
0 Replies