cancel
Showing results for 
Search instead for 
Did you mean: 

General Discussions

tinux
Adept I

Radeon Pro VII FP64 performance on ROCm

Hello everyone

I upgraded my simulation GPU from a Radeon VII to a Radeon Pro VII in order to benefit from the higher FP64 computational performance. The Pro is advertised to achieve 6.5 TFLOPs, whereas the non-Pro should have something between 2784 and 3458.5 GFLOPs according to wikipedia. The non-Pro has a 1:2 multiplier for FP64, so these numbers make sense.

I compared the performance of the two GPUs using a sequence of complex-complex large 3d FFTs in ROCm. Unfortunately, the performance is almost identical on similar hardware, with the non-Pro winning by a few percent.

Therefore, my question is how I can make use of the higher FP64 performance of the Radeon Pro VII for FP64 calculations.

0 Likes
1 Solution

Apparently FFT is memory bound, and that's why Radeon VII and RP VII perform the same - they both have memory bandwidth of 1GB/s. Try "vkFFT" instead.

https://github.com/DTolm/VkFFT 

View solution in original post

15 Replies

@fsadough  AMD Forums expert on Professional GPU Cards might be able to answer that question.

0 Likes
fsadough
Moderator

  1. Describe your system info please, like motherboard, BIOS version, etc.
  2. Which OS and kernel version?
  3. Which ROCm version is installed?
  4. Which benchmark/instruction set? Provide the link please.

On both systems the installation was done according to this, and the OS was newly installed today. For both it is Ubuntu 20.04 LTS, since this is an officially supported ROCm distro.

The code that I used for benchmarking can be found here.

The non-Pro system:
- mainboard: Gigabyte X570 I AORUS PRO WIFI
- BIOS: F33g - 03/25/2021
- CPU: AMD Ryzen 5 3600X 6-Core Processor
- GPU: AMD Radeon VII
- OS/kernel: 5.8.0-50-generic #56~20.04.1-Ubuntu (fresh installation)
- ROCm: rock-dkms/Ubuntu 16.04,now 1:4.1-26


The Pro system:
- mainboard: AsRock B550M Steel Legend
- BIOS: P1.00 - 05/15/2020
- CPU: AMD Ryzen 9 3950X 16-Core Processor
- GPU: AMD Radeon Pro VII
- OS/kernel: 5.8.0-50-generic #56~20.04.1-Ubuntu (fresh installation)
- ROCm: rock-dkms/Ubuntu 16.04,now 1:4.1-26

 

I also tried the Pro GPU in the non-Pro system as written above, but I got the same (slow) performance.

Please provide output from the following commands:

  1. uname -r
  2. sudo lshw -c video
  3. apt show rocm-libs -a
  4. /opt/rocm/opencl/bin/clinfo
  5. /opt/rocm/bin/rocminfo

Please have a look at the updated pastebin repo. I used "get_data.sh" on each system and stored the corresponding outputs in the corresponding directories.

0 Likes

You are using kernel version: 5.8.0-50-generic. Please downgrade to 5.6

https://www.amd.com/en/support/kb/release-notes/rn-pro-lin-21-q1

Note: For Ubuntu 20.04.1, only Kernel 5.4/5.6 are supported. Latest Kernel version 5.8+ is not supported. Customers with Kernel 5.8+, need to downgrade to 5.4/5.6 for proper driver support.

 

https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html#prerequisites

Prerequisites

The AMD ROCm platform is designed to support the following operating systems:

  • Ubuntu 20.04.1 (5.4 and 5.6-oem) and 18.04.5 (Kernel 5.4)

  • CentOS 7.9 (3.10.0-1127) & RHEL 7.9 (3.10.0-1160.6.1.el7) (Using devtoolset-7 runtime support)

  • CentOS 8.3 (4.18.0-193.el8) and RHEL 8.3 (4.18.0-193.1.1.el8) (devtoolset is not required)

  • SLES 15 SP2

I am getting the same performance on 5.6.0-1055-oem.

0 Likes

Using Radeon Pro VII system, can you please send a new set of the following info, so I can file a ticket?

  1. uname -r
  2. sudo lshw -c video
  3. apt show rocm-libs -a
  4. /opt/rocm/opencl/bin/clinfo
  5. /opt/rocm/bin/rocminfo

I'm sorry, I forgot that. You'll find the data here.

I have filed a ticket for our Software engineers

What are the numbers you are getting with complex-complex large 3d FFTs?

0 Likes

On the Radeon Pro VII: 16.0856 s

On the Radeon VII: 16.4581 s

 

EDIT: This is with this code.

Apparently FFT is memory bound, and that's why Radeon VII and RP VII perform the same - they both have memory bandwidth of 1GB/s. Try "vkFFT" instead.

https://github.com/DTolm/VkFFT 

@fsadoughThank you so much!

I understand. I will run some tests on more normal computations and compare the performance between the pro and non-pro GPUs. It will take a bit of time though. I will give vkFFT a try. This tip is very helpful!

0 Likes

0 Likes