void FFT::Init(__global float *real,__global float *imag,__local float *Localds)
{
gr = real;
gi = imag;
lds = Localds;
gid = get_global_id(0);
me = gid & 0x3fU;
dg = (gid >> 6) * VSTRIDE;
gr += dg;
gi += dg;
}
In the host program globalThreads[0] = 64. Therefore, dg (the mp step) is always 0. As a result, it runs the full 64x1024 FFT each time, overwriting previous results. I am sure this is not intentional. Plz fix.