You seems to be reading and writing from the same stream. That is not currently possible in Brook+. To make it work, you need to use a scatter and a regular output stream. That again is a limitation in Brook+. But, you can try dividing your kernel in two parts -
kernel void Kernel1(int frame<>, out int mask[][])
{
int2 vPos = instance().xy;
int xMask = vPos.x / edge;
int yMask = vPos.y / edge;
int2 maskPos = int2(xMask, yMask); // index of grid cell when current pixel is located
int color = frame; // current pixel
int red = color & 0xFF; // get r-channel
if (red > 200)
mask[maskPos]++;
}
and
kernel void Kernel2(int frame<>, out int outputFrame<>
{
outputFrame = frame | 0x00FF0000;
}
Also you need to set execution domain on Kernel1 from host side code-
// Set domain offset
Kernel1.domainOffset(uint4(0, 0, 0, 0));
Kernel1.domainSize(uint4(704, 576, 1, 1));
// call kernel
Kernel1(img, mask);
But, I think its very unlikely that this code will work as underlying hardware has some limitations in size and domain of execution for scatter use (Both has to be 64-aligned). If you can resolve these alignment constraints, it should work.