Hello everyone
I am developing a simulation tool for linear and nonlinear propagation of ultrashort laser pulses in 3D. It uses a split-step method to solve the generalized nonlinear Schrödinger equation. This is done by applying effects to a 3d array in time t, and both the x and y spatial axes, where t is the slow axis, and x the fast axis (and y the medium). Most effects are applied in one of the following spaces
- time/space,
- frequency/space, and
- frequency/reciprocal space,
which means I need to convert, i.e. Fourier-transform, between them, in order to apply certain effects.
For performance reason, the entire code is executed on a GPU, including the different FFTs, where I make use of rocfft.
I have now realized that the 3D-FFT does not always give me correct results when the 3d-array-lengths become larger. For debugging, I wrote some simple tests to see where this happens. You can find the code in [this github repo](https://github.com/t1nux/roc_fft_bug).
What I do there is
- define 1D, 2D, and 3D FFTs for handling the transforms between the spaces mentioned above,
- initialize the data to a 3D-step-function and store for plotting
- copy to GPU
- do a 3D FFT
- do a 2D FFT in x and y, followed by a 1d FFT in t (should be the same as the 3D FFT)
- copy the data back and store for plotting
Now, when I plot the 3D FFT, the 2D+1D FFT, and the 3D FFT made in Numpy, the results typically look identical, unless the dimensions get to large. You can find python notebook in the repo mentioned above to illustrate all this.
For instance, for 3D array sizes (t, y, x) of
- [2^8, 2^8, 2^11]: everything looks good,
- [2^8, 2^8, 2^12]: the 3D FFT is incorrect, but the 2D+1D FFT seems correct,
- [2^8, 2^12, 2^8]: both the 3D and 2D+1D FFT are incorrect.
It seems that it fails if the combined length in the different dimensions exceed a certain value, but I cannot pinpoint where the threshold is. What I found confusing, is that
- [2^4, 2^12, 2^8]: works
- [2^8, 2^12, 2^8]: only does not work in the 3D FFT case
- [2^8, 2^*, 2^12]: does not work for 3D and 2D+1D FFT cases
All this was tested on 2 systems:
- system 1
- Arch Linux
- Kernel 6.7.5-arch1-1
- ROCM 6.0.0
- GPU: RX 7900 XTX
- system 2
- Arch Linux
- Kernel 6.7.4-arch1-1
- ROCM 6.0.0
- GPU: Radeon VII Pro
I would appreciate if someone could test this.
Thanks!
EDIT: Typo