2 Replies Latest reply on Feb 20, 2012 3:13 PM by marblecanyon

    Computation of the imaginary part of the Chrestenson spectrum

    marblecanyon

      Hi, everyone

       

      I'm implementing an algorithm for the computation of the Chrestenson spectrum that is composed of complex numbers.

       

      First I implemented the following kernel with hard-coded values and it returns correct results:

       

      __kernel void fastVCT(__global float2 *vector,

                                       uint step)

      {

        const uint tid=get_global_id(0);

       

        uint op1 = tid%step + 3*step*(tid/step);

         uint op2 = op1 +step;

         uint op3 = op2 + step;

       

         float2 u = vector[op1];

         float2 v = vector[op2];

         float2 w = vector[op3];

       

         vector[op1] = u + v +w;

         vector[op2] = u - 0.5f*v.x + 0.866025*v.y - 0.5f*w.x - 0.866025*w.y;

         vector[op3] = u - 0.5f*v.x - 0.866025*v.y - 0.5f*w.x + 0.866025*w.y;

      }

       

      Afterwards, I proceeded to creating a more general version of the kernel, with parameter P being passed using compiler option "-DNAME=SIZE", for example "-DP=3", and multiplication factors being pre-computed in the host program and being passed as factor:

       

      __kernel void fastVCT(__global float2 *vector,

                                      __global float2 *factor,

                                      const unsigned int step)

      {

                const unsigned int tid=get_global_id(0);

       

                unsigned int i = 0, j = 0;

                float2 u[P];

                float2 vec[P];

       

                const unsigned int op = tid % step + P * step * (tid / step);

       

                for (i = 0; i < P; i++)

                {

                  u[i] = vector[op + i *step];   //fetching the appropriate values from global buffer

                  vec[i] = (float2)(0.0f);    //just to initialize values since it will be used as accumulator

                }

       

                for (i = 0; i < P; i++)

                {

                   for (j = 0; j < P; j++)

                   {

                         vec[i] += factor[i * P + j] * u[j];  //doing the actual computation

                    }

                }

       

                for (i = 0; i < P; i++)

                {

                     vector[op + i * step] = vec[i];   //writing the final results back to global buffer

                }

      }

       

       

      The problem with this second version of the kernel is that all of the imaginary parts in vector are returned as 0.0f, while the real part is computed correctly. I checked the values for vector and factor that are being passed and they seem correct.

       

      I would greatly appreciate if someone could point me to a possible source of the error, since I've been trying to find it for quite a while now, but haven't been able to.