Hello,
I have a Server with two MI100 Accelerator and they are connected through the AMD bridge.
The OS installed is Centos 7.9. I am able to see the 2 GPUs with these two commands:
/opt/rocm/bin/rocminfo
/opt/rocm/opencl/bin/clinfo
but I don't know how to see if the bridge is working or not. Is there a specific command I should execute to check if the bridge is working?
thank you
Solved! Go to Solution.
Hi,
Can you share the platform being used to run the cards?
We do have an ROCm bandwidth test utility, which could be used to see if the bridge is working. It's in the ROCm install.
$ rocm-bandwidth-test -t
or
For stress testing to, install rocm-bandwidth-test (i.e. yum install rocm-bandwidth-test) . It will allow you to test bandwidth between the GPUS:
/opt/rocm/bin/rocm-bandwidth-test ?
Supported arguments:
-h Prints the help screen
-q Query version of the test
-v Run the test in validation mode
-l Run test to collect Latency data
-c Time the operation using CPU Timers
-e Prints the list of ROCm devices enabled on platform
-i Initialize copy buffer with specified 'long double' pattern
-t Prints system topology and allocatable memory info
-m List of buffer sizes to use, specified in Megabytes
-b List devices to use in bidirectional copy operations
-s List of source devices to use in copy unidirectional operations
-d List of destination devices to use in unidirectional copy operations
-a Perform Unidirectional Copy involving all device combinations
-A Perform Bidirectional Copy involving all device combinations
NOTE: Mixing following options is illegal/unsupported
Case 1: rocm_bandwidth_test -a with {lm}{1,}
Case 2: rocm_bandwidth_test -b with {clv}{1,}
Case 3: rocm_bandwidth_test -A with {clmv}{1,}
Case 4: rocm_bandwidth_test -s x -d y with {lmv}{2,}
Another option:
Use rocm-smi (/opt/rocm/bin/rocm-smi) to show GPU topology and print information on how the GPUs are connected (PCIE or XGMI):
I.e.
rocm-smi --showtopo
…..
========================== Link Type between two GPUs ==========================
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5
GPU0 0 PCIE PCIE PCIE PCIE PCIE
GPU1 PCIE 0 PCIE PCIE PCIE PCIE
GPU2 PCIE PCIE 0 PCIE PCIE PCIE
GPU3 PCIE PCIE PCIE 0 PCIE PCIE
GPU4 PCIE PCIE PCIE PCIE 0 PCIE
GPU5 PCIE PCIE PCIE PCIE PCIE 0
…
Regards,
Guy
Hi,
Can you share the platform being used to run the cards?
We do have an ROCm bandwidth test utility, which could be used to see if the bridge is working. It's in the ROCm install.
$ rocm-bandwidth-test -t
or
For stress testing to, install rocm-bandwidth-test (i.e. yum install rocm-bandwidth-test) . It will allow you to test bandwidth between the GPUS:
/opt/rocm/bin/rocm-bandwidth-test ?
Supported arguments:
-h Prints the help screen
-q Query version of the test
-v Run the test in validation mode
-l Run test to collect Latency data
-c Time the operation using CPU Timers
-e Prints the list of ROCm devices enabled on platform
-i Initialize copy buffer with specified 'long double' pattern
-t Prints system topology and allocatable memory info
-m List of buffer sizes to use, specified in Megabytes
-b List devices to use in bidirectional copy operations
-s List of source devices to use in copy unidirectional operations
-d List of destination devices to use in unidirectional copy operations
-a Perform Unidirectional Copy involving all device combinations
-A Perform Bidirectional Copy involving all device combinations
NOTE: Mixing following options is illegal/unsupported
Case 1: rocm_bandwidth_test -a with {lm}{1,}
Case 2: rocm_bandwidth_test -b with {clv}{1,}
Case 3: rocm_bandwidth_test -A with {clmv}{1,}
Case 4: rocm_bandwidth_test -s x -d y with {lmv}{2,}
Another option:
Use rocm-smi (/opt/rocm/bin/rocm-smi) to show GPU topology and print information on how the GPUs are connected (PCIE or XGMI):
I.e.
rocm-smi --showtopo
…..
========================== Link Type between two GPUs ==========================
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5
GPU0 0 PCIE PCIE PCIE PCIE PCIE
GPU1 PCIE 0 PCIE PCIE PCIE PCIE
GPU2 PCIE PCIE 0 PCIE PCIE PCIE
GPU3 PCIE PCIE PCIE 0 PCIE PCIE
GPU4 PCIE PCIE PCIE PCIE 0 PCIE
GPU5 PCIE PCIE PCIE PCIE PCIE 0
…
Regards,
Guy
Thanks for the asnwer Guy,
So, here is my problem:
The 2 MI100 without the bridge are seen by the system:
========================== Link Type between two GPUs ==========================
GPU0 GPU1
GPU0 0 PCIE
GPU1 PCIE 0
******************
Instead, If I connect them with the bridge then the system doesn't see them anymore at all.
The bridge is a 4 slots bridge, but I have only 2 MI100.
Is it possible that having a 4 slots bridge for only 2 MI100 then it does not work?
regards
Hi,
Unfortunately, I did confirm the Infinity Fabric bridge card requires 4 GPUs and we do not have a bridge card for 2 GPU configurations. You would have to populate the other 2 GPUs for it to work.
Guy
what does the bridge do for them in terms of performance versus not using them if you did have 4?