Problems with big Matrices

Segfaul with Matrix-Matrix-Multiplication on Matrices greater than 831x831

I postet this Question already yesterday but today it was not findable. So here again:

I just want to test the Performance of the FireStream 9170 with a simple Matrix-Matrix-Multiplication in single precision. So I wrote a really simple kernel like this:

void matmul_kernel(float a<>, float b<>, out float c<>
    c = a * b;

So if I run the executable with nxn-matrices with an n < 832 all works well. But if I start it with an n = 832 or above I get a segmentation fault.

I read that the size of a 2D stream is limited by 8192x8192. So what is the problem. I compiled it with address translation and without (-r flag), but it is exactly the same in both cases except for a less performance on address translated code.

I would be thankkful for any kind of help.