Not sure if this is answering the same question you are asking, but I'll try.
The first thing to understand is that ACML uses FORTRAN matrix ordering, column order and starting with 1. This is different from C where data is stored in row order and indexes start at 0.
For a 2D matrix X(M,N), each column of data is stored sequentially. For instance X(2,1) is stored at the next memory address after X(1,1). X(1,2) is stored one column away from X(1,1). The documentation explains this by showing that X(i,j) is stored at i + (j-1)*M.
Extending to 3D with X(L,M,N), X(2,1,1) is stored next to X(1,1,1), X(1,2,1) is one column of length L away, and X(1,1,2) is stored one L*M "slice" away in memory. The documentation explains that X(i,j,k) is stored at i + (j-1)*L + (k-1)*L*M.
The 3D FFTs proceed in 4 steps. First, 2D transforms are done on each L*M slice (visualize plane on L and M stacked on top of each other). There are N of these 2D transforms. Next the entire array is transposed, such that L,M,N -> N,L,M. The data that used to be on the N axis is now stored sequentially in memory.
The third step is to perform L*M 1D transforms of length N on each column of this new array. The transform is done first because doing an FFT on data with very large strides is extremely inefficient. (Each of the 1D transforms operates on N values all stored sequentially in memory.)
The fourth step is to un-transform the data. And I think this is the answer you are looking for, which is yes, the data is put back into its original order.