Problem with simple kernel, create matrix with indices

Discussion created by Peterp on May 27, 2009
Latest reply on May 27, 2009 by Peterp


 I've a problem with a very simple kernel. The kernel takes three parameters, one stream with int2 they should be used as indices. The second parameter is a stream of double values and the last stream is the output-stream. The kernel should produce the same result as the build_matrix_cpu function. The kernel should build a matrix that is described with the indices:

 kernel void rebuildMatrices(int2 indices <>,

                           double values[][],

                       out double result_matrix<>


 result_matrix =in_r[indices.y][indices.x];


 This is the host program, it simply builds up the index array and the value array and then it calls the kernel. The result matrix should be the input matrix shifted one column to the left. And the free column at the right side should be filled with last column. It should be like this:

Input: 0 1 2 3   

           4 5 6 7

           8 9 10 11


Output: 1 2 3 3

              5 6 7 7

              9 10 11 11

 But the program produces something else if I choose dim1!=dim2, what I'm doing wrong?

Here is the code for the host code, in the end it prints the indices, the input and the output.


void print_matrix(const unsigned int dim1,

                        const unsigned int dim2,

                        const T *zahlen,

                        const string str)


     cout << str;

     for(int i=0;i


          for(int j=0;j


              cout << zahlen[i*dim2+j] << " ";


          cout << "\n";





ostream& operator<<(ostream &os, const int2 &v)


     os << "("<<v.y<<"|"<<v.x<<")";

     return os;



void build_matrix_cpu(const int dim1, const int dim2,

                   const int2 *indices,const double *values, double *output)


     for(int i=0;i

          for(int j=0;j


              double v =  values[(indices[i*dim2+j].y*dim2)+indices[i*dim2+j].x];

              output[i*dim2+j] = v;




int main()


     const unsigned int dim1 = 3;

     const unsigned int dim2 = 4;

     int2 *indices_rLeft         = new int2[dim1*dim2];


     // build the index array

     for(int i=0;i

          for(int j=1;j

              indices_rLeft[i*dim2+j-1] = int2(j,i);


     for(int i=0;i

       for(int j=dim2-1;j

              indices_rLeft[i*dim2+j] = indices_rLeft[(i*dim2)+j-1];



    // fill the values array with numbers

     double zahlen[dim1*dim2];

     for(int i=0;i

          for(int j=0;j

              zahlen[i*dim2+j] =  i*dim2+j;



     const int rank = 2;

     unsigned int dims[] = {dim1,dim2 };





     double result[dim1*dim2];











     delete[] indices_rLeft;


     int k = 0;

     cin >> k;

     return 0;