ROCm Discussions

tinux · ‎02-28-2024

Hello everyone

I am developing a simulation tool for linear and nonlinear propagation of ultrashort laser pulses in 3D. It uses a split-step method to solve the generalized nonlinear Schrödinger equation. This is done by applying effects to a 3d array in time t, and both the x and y spatial axes, where t is the slow axis, and x the fast axis (and y the medium). Most effects are applied in one of the following spaces

time/space,
frequency/space, and
frequency/reciprocal space,

which means I need to convert, i.e. Fourier-transform, between them, in order to apply certain effects.

For performance reason, the entire code is executed on a GPU, including the different FFTs, where I make use of rocfft.

I have now realized that the 3D-FFT does not always give me correct results when the 3d-array-lengths become larger. For debugging, I wrote some simple tests to see where this happens. You can find the code in this github repo.

What I do there is

define 1D, 2D, and 3D FFTs for handling the transforms between the spaces mentioned above,
initialize the data to a 3D-step-function and store for plotting
copy to GPU
do a 3D FFT
using the initial 3D-step-function again, do a 2D FFT in x and y, followed by a 1D FFT in t (should be the same as the 3D FFT above)
copy the data back and store for plotting

Now, when I plot the 3D FFT, the 2D+1D FFT, and the 3D FFT made in Numpy, the results typically look identical, unless the dimensions get to large. You can find a python notebook in the repo mentioned above to illustrate all this. In the notebook a number of 3D-array-sizes are give that do not work correctly.

For instance, for 3D array sizes (t, y, x) of

[2^8, 2^8, 2^11]: everything looks good,
[2^8, 2^8, 2^12]: the 3D FFT is incorrect, but the 2D+1D FFT seems correct,
[2^8, 2^12, 2^8]: both, the 3D and 2D+1D FFT are incorrect.

It seems that it fails if the combined lengths in the different dimensions exceed a certain value, but I cannot pinpoint where the threshold is. What I found confusing, is that

[2^4, 2^12, 2^8]: works
[2^8, 2^12, 2^8]: only does not work in the 3D FFT case
[2^8, 2^8, 2^12]: does not work for 3D or 2D+1D FFT cases

All this was tested on 3 systems with 3 different GPUs:

system 1
- Arch Linux
- Kernel 6.7.5-arch1-1
- ROCM 6.0.0
- GPU: RX 7900 XTX
system 2
- Arch Linux
- Kernel 6.7.4-arch1-1
- ROCM 6.0.0
- GPU: Radeon VII Pro
system 3
- Arch Linux
- Kernel 6.7.6-arch1-1
- ROCM 6.0.0
- GPU: RX 6900 XT

I would appreciate if someone could test this.

Thanks!

ROCm Discussions

[rocfft] incorrect results for certain (large) dimensions in 3D FFTs