*SOLVED*
Hello.
I've written a simple reduction kernel (btw, there's a bug in the Stream_Computing_User_Guide.pdf shipped with brook+ 1.2.1 beta at the section about reduction kernels, i think that syntax isn't supported anymore) which looks like:
reduce void GpuSum(float a<>, reduce float s)
{
s += a;
}
float Sum(int n, float *a)
{
float sa;
float r;
streamRead(sa, a);
GpuSum(sa, r);
return r;
}
The main function calling this is like:
int main()
{
const int SIZE = 100 * 1024;
float a[SIZE];
for (int i = 0; i < SIZE; ++i)
*(a + i) = static_cast(i);
float r = Sum(SIZE, a);
printf("%f\n", r);
return 0;
}
Thing is i can't get SIZE to be 1024 * 1024 (program simply crashes). 100 * 1024 works fine. So does 224 * 1024; 228 * 1024 prints "Failed to find usable kernel fragment to implement requested reduction." as if no address virtualization was supplied, while 256 * 1024 directly makes my app crash.
Is this normal? In documentation is stated that 1D arrays of up to 8192 * 8192 = 64M elements can be accessed when brook-compiled without -r, so what am i doing wrong here?
4850 with 8.10 whql
2008 server x64
brook+ and cal beta 1.2.1 x64
visual studio 2008 with x64 Release build.
LE
ok, by simply modifying in main to
const int SIZE = 1024 * 1024;
float *a = new float[ SIZE ];
the thing seems to work now, but when size is 228 * 1024, the program still gives the "Failed to find usable kernel fragment to implement requested reduction." error.