cancel
Showing results for 
Search instead for 
Did you mean: 

Discussions

question12345
Journeyman III

Bridge for AMD MI100 Accelerator

Hello,

I have a Server with two MI100 Accelerator and they are connected through the AMD bridge.

The OS installed is Centos 7.9. I am able to see the 2 GPUs with these two commands:
/opt/rocm/bin/rocminfo
/opt/rocm/opencl/bin/clinfo

but I don't know how to see if the bridge is working or not. Is there a specific command I should execute to check if the bridge is working?

thank you

0 Likes
1 Solution

Hi,

Can you share the platform being used to run the cards? 

We do have an ROCm bandwidth test utility, which could be used to see if the bridge is working. It's in the ROCm install. 

$ rocm-bandwidth-test -t

or 

For stress testing to, install rocm-bandwidth-test (i.e.  yum install rocm-bandwidth-test) . It will allow you to test bandwidth between the GPUS:

/opt/rocm/bin/rocm-bandwidth-test ?

Supported arguments:

         -h    Prints the help screen

         -q    Query version of the test

         -v    Run the test in validation mode

         -l    Run test to collect Latency data

         -c    Time the operation using CPU Timers

         -e    Prints the list of ROCm devices enabled on platform

         -i    Initialize copy buffer with specified 'long double' pattern

         -t    Prints system topology and allocatable memory info

         -m    List of buffer sizes to use, specified in Megabytes

         -b    List devices to use in bidirectional copy operations

         -s    List of source devices to use in copy unidirectional operations

         -d    List of destination devices to use in unidirectional copy operations

         -a    Perform Unidirectional Copy involving all device combinations

         -A    Perform Bidirectional Copy involving all device combinations

         NOTE: Mixing following options is illegal/unsupported

                 Case 1: rocm_bandwidth_test -a with {lm}{1,}

                 Case 2: rocm_bandwidth_test -b with {clv}{1,}

                 Case 3: rocm_bandwidth_test -A with {clmv}{1,}

                 Case 4: rocm_bandwidth_test -s x -d y with {lmv}{2,}

Another option:

Use rocm-smi (/opt/rocm/bin/rocm-smi) to show GPU topology and print information on how the GPUs are connected (PCIE or XGMI):

I.e.

rocm-smi --showtopo

…..

========================== Link Type between two GPUs ==========================

       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5

GPU0   0            PCIE         PCIE         PCIE         PCIE         PCIE

GPU1   PCIE         0            PCIE         PCIE         PCIE         PCIE

GPU2   PCIE         PCIE         0            PCIE         PCIE         PCIE

GPU3   PCIE         PCIE         PCIE         0            PCIE         PCIE

GPU4   PCIE         PCIE         PCIE         PCIE         0            PCIE

GPU5   PCIE         PCIE         PCIE         PCIE         PCIE         0

                                …

Regards,

Guy

View solution in original post

4 Replies

Hi,

Can you share the platform being used to run the cards? 

We do have an ROCm bandwidth test utility, which could be used to see if the bridge is working. It's in the ROCm install. 

$ rocm-bandwidth-test -t

or 

For stress testing to, install rocm-bandwidth-test (i.e.  yum install rocm-bandwidth-test) . It will allow you to test bandwidth between the GPUS:

/opt/rocm/bin/rocm-bandwidth-test ?

Supported arguments:

         -h    Prints the help screen

         -q    Query version of the test

         -v    Run the test in validation mode

         -l    Run test to collect Latency data

         -c    Time the operation using CPU Timers

         -e    Prints the list of ROCm devices enabled on platform

         -i    Initialize copy buffer with specified 'long double' pattern

         -t    Prints system topology and allocatable memory info

         -m    List of buffer sizes to use, specified in Megabytes

         -b    List devices to use in bidirectional copy operations

         -s    List of source devices to use in copy unidirectional operations

         -d    List of destination devices to use in unidirectional copy operations

         -a    Perform Unidirectional Copy involving all device combinations

         -A    Perform Bidirectional Copy involving all device combinations

         NOTE: Mixing following options is illegal/unsupported

                 Case 1: rocm_bandwidth_test -a with {lm}{1,}

                 Case 2: rocm_bandwidth_test -b with {clv}{1,}

                 Case 3: rocm_bandwidth_test -A with {clmv}{1,}

                 Case 4: rocm_bandwidth_test -s x -d y with {lmv}{2,}

Another option:

Use rocm-smi (/opt/rocm/bin/rocm-smi) to show GPU topology and print information on how the GPUs are connected (PCIE or XGMI):

I.e.

rocm-smi --showtopo

…..

========================== Link Type between two GPUs ==========================

       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5

GPU0   0            PCIE         PCIE         PCIE         PCIE         PCIE

GPU1   PCIE         0            PCIE         PCIE         PCIE         PCIE

GPU2   PCIE         PCIE         0            PCIE         PCIE         PCIE

GPU3   PCIE         PCIE         PCIE         0            PCIE         PCIE

GPU4   PCIE         PCIE         PCIE         PCIE         0            PCIE

GPU5   PCIE         PCIE         PCIE         PCIE         PCIE         0

                                …

Regards,

Guy

Thanks for the asnwer Guy,

So, here is my problem:

The 2 MI100 without the bridge are seen by the system:

========================== Link Type between two GPUs ==========================
            GPU0 GPU1
GPU0  0         PCIE
GPU1 PCIE    0

******************

Instead, If I connect them with the bridge then the system doesn't see them anymore at all.

The bridge is a 4 slots bridge, but I have only 2 MI100.

Is it possible that having a 4 slots bridge for only 2 MI100 then it does not work?

regards

0 Likes

Hi,

Unfortunately, I did confirm the Infinity Fabric bridge card requires 4 GPUs and we do not have a bridge card for 2 GPU configurations. You would have to populate the other 2 GPUs for it to work. 

Guy 

Zer0GriD
Adept I

what does the bridge do for them in terms of performance versus not using them if you did have 4?

Asus Sage wrx80 plus, 5955wx pro, 512gb 3200 ram ecc, watercooled, 5tb {2tb 0 raid}, Radeon Pro w6800
0 Likes