I postet this Question already yesterday but today it was not findable. So here again:

I just want to test the Performance of the FireStream 9170 with a simple Matrix-Matrix-Multiplication in single precision. So I wrote a really simple kernel like this:

kernel

void matmul_kernel(float a<>, float b<>, out float c<>

{

c = a * b;

}

So if I run the executable with nxn-matrices with an n < 832 all works well. But if I start it with an n = 832 or above I get a segmentation fault.

I read that the size of a 2D stream is limited by 8192x8192. So what is the problem. I compiled it with address translation and without (-r flag), but it is exactly the same in both cases except for a less performance on address translated code.

I would be thankkful for any kind of help.

For example in MSVC:

int function(...) {

float output[4096][4096];

...

}

Would probably abort, so you have to change the stack size in:

project -> properties -> linker -> system -> stack

I hope this helps. If not, what happens if you comment the line that calls the kernel? Do Brook+ examples work?