I know that, in Brook+ programming model, each element in output stream becomes a thread automatically and a thread should be communication-free.
The following code is communcation-free with respect to outer loop (index i) but it seems it can't be parallelizable in Brook+ because inner loop is still not communication free. Is there any way to implement this code in Brook+ with respect to outer loop or in any way?
for(int i=0; i<N; i++) {
for(int j=1; j<N; j++) {
A[ i ][ j ] = A[ i ][ j-1 ] + 1;
}
}