I use clAmdFft version 1.8.239 (Win7 64).
Using this command
clAmdFft.Client -x 4096 -y 1 -z 1 --inLayout 2 --outLayout 2 -d
the library generates the file clAmdFft.kernel.Stockham1.cl.
Now lets look at the end of the function FwdPass0
if(rw)
{
bufOutRe[outOffset + ( ((2*me + 0)/1)*8 + (2*me + 0)%1 + 0 )*1] = B2C0R0.s0; //line 4234
// more lines like these
bufOutIm[outOffset + ( ((2*me + 1)/1)*8 + (2*me + 1)%1 + 7 )*1] = B2C0I7.s1; //line 4265
}
rw is always 1.
bufOutRe points to __local float lds[0]
bufOutIm points to __local float lds[4096]
outOffset is always 0.
me is the local id that goes from 0 to 255.
So if me=255 this makes ((2*255+1)/1)*8 + ((2*255+1)%1 + 7 = 4606.
That means that the Real part overwrites the Imaginary above __local float lds[4096] and
also that the code accesses among others __local float lds[8702] which does NOT exist.
This pattern for the index is found also in function InvPass0.
The funny thing is that the test reports "PASS".
I don't know if I'm missing something.