Hi, everyone
I'm implementing an algorithm for the computation of the Chrestenson spectrum that is composed of complex numbers.
First I implemented the following kernel with hard-coded values and it returns correct results:
__kernel void fastVCT(__global float2 *vector,
uint step)
{
const uint tid=get_global_id(0);
uint op1 = tid%step + 3*step*(tid/step);
uint op2 = op1 +step;
uint op3 = op2 + step;
float2 u = vector[op1];
float2 v = vector[op2];
float2 w = vector[op3];
vector[op1] = u + v +w;
vector[op2] = u - 0.5f*v.x + 0.866025*v.y - 0.5f*w.x - 0.866025*w.y;
vector[op3] = u - 0.5f*v.x - 0.866025*v.y - 0.5f*w.x + 0.866025*w.y;
}
Afterwards, I proceeded to creating a more general version of the kernel, with parameter P being passed using compiler option "-DNAME=SIZE", for example "-DP=3", and multiplication factors being pre-computed in the host program and being passed as factor:
__kernel void fastVCT(__global float2 *vector,
__global float2 *factor,
const unsigned int step)
{
const unsigned int tid=get_global_id(0);
unsigned int i = 0, j = 0;
float2 u
;
float2 vec
;
const unsigned int op = tid % step + P * step * (tid / step);
for (i = 0; i < P; i++)
{
u = vector[op + i *step]; //fetching the appropriate values from global buffer
vec = (float2)(0.0f); //just to initialize values since it will be used as accumulator
}
for (i = 0; i < P; i++)
{
for (j = 0; j < P; j++)
{
vec += factor[i * P + j] * u
}
}
for (i = 0; i < P; i++)
{
vector[op + i * step] = vec; //writing the final results back to global buffer
}
}
The problem with this second version of the kernel is that all of the imaginary parts in vector are returned as 0.0f, while the real part is computed correctly. I checked the values for vector and factor that are being passed and they seem correct.
I would greatly appreciate if someone could point me to a possible source of the error, since I've been trying to find it for quite a while now, but haven't been able to.
Solved! Go to Solution.
I found the error - I didn't implement the equation for the computation of vec[] correctly. It should be:
vec += (float2)(ch[i * P + j].x * op
since the multiplication of two imaginary components includes i * i = -1 and therefore adds to the real part.
There should be nothing wrong with your array stuff from what I can see.
My guess is something to do with the host code, and perhaps the initialisation of factor. I know you said you checked this though ...
I found the error - I didn't implement the equation for the computation of vec[] correctly. It should be:
vec += (float2)(ch[i * P + j].x * op
since the multiplication of two imaginary components includes i * i = -1 and therefore adds to the real part.