Please somebody help with this "gather()" operation I am having real issues with

I am trying to learn Brook+ and as such have implemented the sparse matrix-vector example that comes with the beta 1.4 download within Visual Studio 8. However I am encountering some strange behavior with the gather method - perhaps I don't understand the operation.

Let Size be 6333 - this is the total number of NON-ZERO elements of the 'a' matrix called 'ahat'. Let Length be Size * NzWidth where NzWidth is the maximum number of NON-ZERO elements of ALL rows of the original 'a' matrix. In the case of my 'a' matrix, NzWidth is 12.

The result of calling the gather operation should be at least one value within the result array that is NON-ZERO, but I get NO NON-ZERO elements in the result array. I have no idea why, but could use some help from the experts.

Thanks in advance for ANY hints/ideas

Below is the relavent code:

kernel void gather(float index<>, float x[], out float result<>) { result = x[index]; } void reshuffleData(float *&nz, int *&cols, int *&rowStart, float *&Anz, float *&Acols, unsigned int size, unsigned int nzWidth){ unsigned int i; int j; for (i = 0; i < size; i++){ unsigned int offset = 0; for (j = rowStart[i]; j < rowStart[i + 1]; j++) { Anz[nzWidth * i + offset] = nz[j]; Acols[nzWidth * i + offset] = (float)cols[j]; offset++; } // must pad the rest of the row while (offset < nzWidth) { Anz[nzWidth * i + offset] = 0.0f; Acols[nzWidth * i + offset] = (float)0.0f; // this should be an invalid index.... but doesn't have to be since x multiplied by a zero here offset++; } }//OUTER FOR-LOOP }//reshuffleData() void gpuMatVecMult(unsigned int size, unsigned int length, float *&cIdx, float *&aNz, float *&x, float *&y){ unsigned int i; // System Memory: Stream<float> AStrm(1, &length); // Non-Zeros of A Stream<float> AStrm2(1, &length); // Non-Zeros of A Stream<float> indices(1, &length); // Column Indices Stream<float> tmp_indices(1, &length); // Temp. Indices Stream<float> xStream(1, &size); Stream<float> yStream(1, &size); // CPU->GPU: indices.read(cIdx); xStream.read(x); // Kernel Calls: gather(indices, xStream, tmp_indices); // GPU->CPU: indices.write(cIdx); for(i = 0; i < length; i++){ float fv = cIdx[i]; if(fv != 0.0f){ // I should get at least ONE non-zero, but I get // no output here!! cout << "Column[" << i << "]=" << fv << endl; } } } // Is size of NON-ZERO array of original 'a' matrix - i.e., 'ahat': unsigned int Size = nn; unsigned int Length = Size * nzWidth; float *cIdx = new float[Length]; float *Anz = new float[Length]; // "Reshuffle" Data for STREAMING: reshuffleData(ahat, csrCols, csrRows, Anz, cIdx, Size, nzWidth); // Use GPU to compute Matrix-Vector Multiplication: gpuMatVecMult(Size, Length, cIdx, ahat, p, u);

What results do you see with CPU backend?