I've written a simple reduction kernel (btw, there's a bug in the Stream_Computing_User_Guide.pdf shipped with brook+ 1.2.1 beta at the section about reduction kernels, i think that syntax isn't supported anymore) which looks like:
[code]reduce void GpuSum(float a<>, reduce float s)
s += a;
float Sum(int n, float *a)
The main function calling this is like:
const int SIZE = 100 * 1024;
for (int i = 0; i < SIZE; ++i)
*(a + i) = static_cast(i);
float r = Sum(SIZE, a);
Thing is i can't get SIZE to be 1024 * 1024 (program simply crashes). 100 * 1024 works fine. So does 224 * 1024; 228 * 1024 prints "Failed to find usable kernel fragment to implement requested reduction." as if no address virtualization was supplied, while 256 * 1024 directly makes my app crash.
Is this normal? In documentation is stated that 1D arrays of up to 8192 * 8192 = 64M elements can be accessed when brook-compiled without -r, so what am i doing wrong here?
4850 with 8.10 whql
2008 server x64
brook+ and cal beta 1.2.1 x64
visual studio 2008 with x64 Release build.
ok, by simply modifying in main to
[code] const int SIZE = 1024 * 1024;
float *a = new float[ SIZE ];[/code]
the thing seems to work now, but when size is 228 * 1024, the program still gives the "Failed to find usable kernel fragment to implement requested reduction." error.