Hello everyone
I upgraded my simulation GPU from a Radeon VII to a Radeon Pro VII in order to benefit from the higher FP64 computational performance. The Pro is advertised to achieve 6.5 TFLOPs, whereas the non-Pro should have something between 2784 and 3458.5 GFLOPs according to wikipedia. The non-Pro has a 1:2 multiplier for FP64, so these numbers make sense.
I compared the performance of the two GPUs using a sequence of complex-complex large 3d FFTs in ROCm. Unfortunately, the performance is almost identical on similar hardware, with the non-Pro winning by a few percent.
Therefore, my question is how I can make use of the higher FP64 performance of the Radeon Pro VII for FP64 calculations.
Solved! Go to Solution.
Apparently FFT is memory bound, and that's why Radeon VII and RP VII perform the same - they both have memory bandwidth of 1GB/s. Try "vkFFT" instead.
@fsadough AMD Forums expert on Professional GPU Cards might be able to answer that question.
On both systems the installation was done according to this, and the OS was newly installed today. For both it is Ubuntu 20.04 LTS, since this is an officially supported ROCm distro.
The code that I used for benchmarking can be found here.
The non-Pro system:
- mainboard: Gigabyte X570 I AORUS PRO WIFI
- BIOS: F33g - 03/25/2021
- CPU: AMD Ryzen 5 3600X 6-Core Processor
- GPU: AMD Radeon VII
- OS/kernel: 5.8.0-50-generic #56~20.04.1-Ubuntu (fresh installation)
- ROCm: rock-dkms/Ubuntu 16.04,now 1:4.1-26
The Pro system:
- mainboard: AsRock B550M Steel Legend
- BIOS: P1.00 - 05/15/2020
- CPU: AMD Ryzen 9 3950X 16-Core Processor
- GPU: AMD Radeon Pro VII
- OS/kernel: 5.8.0-50-generic #56~20.04.1-Ubuntu (fresh installation)
- ROCm: rock-dkms/Ubuntu 16.04,now 1:4.1-26
I also tried the Pro GPU in the non-Pro system as written above, but I got the same (slow) performance.
Please provide output from the following commands:
Please have a look at the updated pastebin repo. I used "get_data.sh" on each system and stored the corresponding outputs in the corresponding directories.
You are using kernel version: 5.8.0-50-generic. Please downgrade to 5.6
https://www.amd.com/en/support/kb/release-notes/rn-pro-lin-21-q1
Note: For Ubuntu 20.04.1, only Kernel 5.4/5.6 are supported. Latest Kernel version 5.8+ is not supported. Customers with Kernel 5.8+, need to downgrade to 5.4/5.6 for proper driver support.
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html#prerequisites
Prerequisites
The AMD ROCm platform is designed to support the following operating systems:
Ubuntu 20.04.1 (5.4 and 5.6-oem) and 18.04.5 (Kernel 5.4)
CentOS 7.9 (3.10.0-1127) & RHEL 7.9 (3.10.0-1160.6.1.el7) (Using devtoolset-7 runtime support)
CentOS 8.3 (4.18.0-193.el8) and RHEL 8.3 (4.18.0-193.1.1.el8) (devtoolset is not required)
SLES 15 SP2
I am getting the same performance on 5.6.0-1055-oem.
Using Radeon Pro VII system, can you please send a new set of the following info, so I can file a ticket?
I'm sorry, I forgot that. You'll find the data here.
I have filed a ticket for our Software engineers
What are the numbers you are getting with complex-complex large 3d FFTs?
Apparently FFT is memory bound, and that's why Radeon VII and RP VII perform the same - they both have memory bandwidth of 1GB/s. Try "vkFFT" instead.
@fsadoughThank you so much!
I understand. I will run some tests on more normal computations and compare the performance between the pro and non-pro GPUs. It will take a bit of time though. I will give vkFFT a try. This tip is very helpful!