Hello everyone
I am developing a simulation tool for linear and nonlinear propagation of ultrashort laser pulses in 3D. It uses a split-step method to solve the generalized nonlinear Schrödinger equation. This is done by applying effects to a 3d array in time t, and both the x and y spatial axes, where t is the slow axis, and x the fast axis (and y the medium). Most effects are applied in one of the following spaces
- time/space,
- frequency/space, and
- frequency/reciprocal space,
which means I need to convert, i.e. Fourier-transform, between them, in order to apply certain effects.
For performance reason, the entire code is executed on a GPU, including the different FFTs, where I make use of rocfft.
I have now realized that the 3D-FFT does not always give me correct results when the 3d-array-lengths become larger. For debugging, I wrote some simple tests to see where this happens. You can find the code in this github repo.
What I do there is
- define 1D, 2D, and 3D FFTs for handling the transforms between the spaces mentioned above,
- initialize the data to a 3D-step-function and store for plotting
- copy to GPU
- do a 3D FFT
- using the initial 3D-step-function again, do a 2D FFT in x and y, followed by a 1D FFT in t (should be the same as the 3D FFT above)
- copy the data back and store for plotting
Now, when I plot the 3D FFT, the 2D+1D FFT, and the 3D FFT made in Numpy, the results typically look identical, unless the dimensions get to large. You can find a python notebook in the repo mentioned above to illustrate all this. In the notebook a number of 3D-array-sizes are give that do not work correctly.
For instance, for 3D array sizes (t, y, x) of
- [2^8, 2^8, 2^11]: everything looks good,
- [2^8, 2^8, 2^12]: the 3D FFT is incorrect, but the 2D+1D FFT seems correct,
- [2^8, 2^12, 2^8]: both, the 3D and 2D+1D FFT are incorrect.
It seems that it fails if the combined lengths in the different dimensions exceed a certain value, but I cannot pinpoint where the threshold is. What I found confusing, is that
- [2^4, 2^12, 2^8]: works
- [2^8, 2^12, 2^8]: only does not work in the 3D FFT case
- [2^8, 2^8, 2^12]: does not work for 3D or 2D+1D FFT cases
All this was tested on 3 systems with 3 different GPUs:
- system 1
- Arch Linux
- Kernel 6.7.5-arch1-1
- ROCM 6.0.0
- GPU: RX 7900 XTX
- system 2
- Arch Linux
- Kernel 6.7.4-arch1-1
- ROCM 6.0.0
- GPU: Radeon VII Pro
- system 3
- Arch Linux
- Kernel 6.7.6-arch1-1
- ROCM 6.0.0
- GPU: RX 6900 XT
I would appreciate if someone could test this.
Thanks!