Hi,
in some topics i read about reading from output streams, is it possible?
For example this kernel:
kernel void adas(double a<>, out double s[][])
{
int2 i = instance();
double d = s[i.y][i.x];
}
Theorically that shouldn't be possible because 's' is a write only stream. If Brook+ compiles it anyway you may get undefined behaviour. Even if you define the kernel with another gather input parameter and use the same stream as both input and output it would could result in race conditions.
However, looks like you can use the same stream as input and output safely if it is used as pure stream parameters (no gather or scatter). By default aliasing is disabled to prevent programming errors, but you can enable it with environment variable BRT_PERMIT_READ_WRITE_ALIASING.
My understanding is the same as Ceq. The problem ATI has is that if you use more than one stream for the same array (if you ping pong between two streams that are the essentially the same) then you take up twice the memory on the GPU (which is already limited) to do something that should only take up one (as in CUDA). You can use the same stream as an input and output but it's like Ceq says you might have problems with that since the order is not known, etc..
In PS, using the same stream as input and output is dangerous, unless the condition Ceq explained.
However, in CS, we have better control on how program runs, so using same buffer as both input and output is applicable as long as you know clearly what is going on.
And gather/scatter isn't that horrible, if you carefully design how you sample input data and scatter to global buffer. On CUDA, people emphasize the importance of coalesced memory access, and it is very true for ATI too!
Peterp's question is about brook+ specifically, not PS, CS, or CUDA right?
Theoretically yes, but I can't see any practical reason for wanting to do this. You mess up the communication pattern among streams when you use scatter or gather, and you will for sure take a performance hit. Anything you can do with streams, you don't want to do it with gather or scatter. Only in cases where you cannot do want you want with streams should you even contemplate gather/scatter (e.g. implementing local arrays in brook+ or as in the optimized matrix multiply examples in the sdk).