I postet this Question already yesterday but today it was not findable. So here again:
I just want to test the Performance of the FireStream 9170 with a simple Matrix-Matrix-Multiplication in single precision. So I wrote a really simple kernel like this:
void matmul_kernel(float a<>, float b<>, out float c<>
c = a * b;
So if I run the executable with nxn-matrices with an n < 832 all works well. But if I start it with an n = 832 or above I get a segmentation fault.
I read that the size of a 2D stream is limited by 8192x8192. So what is the problem. I compiled it with address translation and without (-r flag), but it is exactly the same in both cases except for a less performance on address translated code.
I would be thankkful for any kind of help.