General Discussions

readonly · ‎02-03-2025

I have a very simple 3D in-place FFT transform code with FFTW and openmp multi-thread support. I tried to get the best performance in a linux machine (Ubuntu with AMD Genoa CPUs -2 sockets). I built it with AMD compiler, aocc 5.0, and AMD-FFTW (optimized with openmp, avx-512) like

clang++ bench_fftw.cpp -o bench_fftw -fopenmp -march=znver4 -O3 -flto -mavx512 -ffast-math -L/opt/AMD/amd-fftw/lib -lfftw3f_omp -lfftw3f -lm -I/opt/AMD/amd-fftw/include

Regarding how to run it, I typically set

export OMP_NUM_THREADS=8 #for 1 CCD/NUMA

export OMP_PLACES=cores #only using the physical core

export OMP_PROC_BIND=close

I also have a version with MKL FFT interface, it is built with Intel compiler icpx and MKL-FFT.

icpx bench_mkl.cpp -o bench_mkl -qopenmp -O3 -0fast -ffast-math -axCORE-AVX2,CORE-AVX512 -qmkl

The binary built with icpx+mkl-fft performs much better than that with aocc+amd-fftw, almost twice faster.

Any advice on how to tune this code in AMD Genoa?

https://stackoverflow.com/questions/79410148/poor-perf-from-aoccamd-fftw-in-linux-with-amd-genoa-cpu...

gq

ajayrant

Hi @readonly ,

Thanks for writing to serverguru forum

Sorry for the delay in response.

Currently we are investigating your issue at our end. we will keep you updated about the same.

Thanks & Regards

Ajay

RookieSideloader

To optimize FFT performance on AMD Genoa CPUs with AOCC and AMD-FFTW, try adjusting compiler flags by removing -mavx512 and using -march=native or -mavx2. Utilize FFTW wisdom for tuning and ensure FFTW uses AVX2 or AVX-512 efficiently. Optimize OpenMP settings by experimenting with OMP_PLACES=threads and proper NUMA policies. Consider testing MKL-FFT with AOCC or using rocFFT for potential gains. Profiling with tools like perf or AMD uProf can also help identify performance bottlenecks.

shrjoshi

Thank you for sharing the test case, compilation and run steps.

We are able to reproduce the issue at our end.

We are currently working on fixing this issue and will let you know once the fix is available.

General Discussions

Poor perf from aocc+amd-fftw in linux with AMD Genoa CPU (compared to built with Intel icpx+mkl)