Hi, everyone

I'm implementing an algorithm for the computation of the Chrestenson spectrum that is composed of complex numbers.

First I implemented the following kernel with hard-coded values and it returns correct results:

__kernel void fastVCT(__global float2 *vector,

uint step)

{

const uint tid=get_global_id(0);

uint op1 = tid%step + 3*step*(tid/step);

uint op2 = op1 +step;

uint op3 = op2 + step;

float2 u = vector[op1];

float2 v = vector[op2];

float2 w = vector[op3];

vector[op1] = u + v +w;

vector[op2] = u - 0.5f*v.x + 0.866025*v.y - 0.5f*w.x - 0.866025*w.y;

vector[op3] = u - 0.5f*v.x - 0.866025*v.y - 0.5f*w.x + 0.866025*w.y;

}

Afterwards, I proceeded to creating a more general version of the kernel, with parameter P being passed using compiler option "-DNAME=SIZE", for example "-DP=3", and multiplication factors being pre-computed in the host program and being passed as *factor*:

__kernel void fastVCT(__global float2 *vector,

__global float2 *factor,

const unsigned int step)

{

const unsigned int tid=get_global_id(0);

unsigned int i = 0, j = 0;

float2 u[P];

float2 vec[P];

const unsigned int op = tid % step + P * step * (tid / step);

for (i = 0; i < P; i++)

{

u[i] = vector[op + i *step]; //fetching the appropriate values from global buffer

vec[i] = (float2)(0.0f); //just to initialize values since it will be used as accumulator

}

for (i = 0; i < P; i++)

{

for (j = 0; j < P; j++)

{

vec[i] += factor[i * P + j] * u[j]; //doing the actual computation

}

}

for (i = 0; i < P; i++)

{

vector[op + i * step] = vec[i]; //writing the final results back to global buffer

}

}

The problem with this second version of the kernel is that all of the imaginary parts in *vector *are returned as 0.0f, while the real part is computed correctly. I checked the values for *vector *and *factor * that are being passed and they seem correct.

I would greatly appreciate if someone could point me to a possible source of the error, since I've been trying to find it for quite a while now, but haven't been able to.

I found the error - I didn't implement the equation for the computation of vec[] correctly. It should be:

vec[i] += (float2)(ch[i * P + j].x * op[j].x - ch[i * P + j].y * op[j].y, ch[i * P + j].x * op[j].y + ch[i * P + j].y * op[j].x) ;

since the multiplication of two imaginary components includes i * i = -1 and therefore adds to the real part.