5 Replies Latest reply on Aug 3, 2009 4:28 PM by hagen

    Is it allowed to read from an output stream?



      in some topics i read about reading from output streams, is it possible?

      For example this kernel:

      kernel void adas(double a<>, out double s[][])
       int2 i = instance();
       double d = s[i.y][i.x];


        • Is it allowed to read from an output stream?

          Theorically that shouldn't be possible because 's' is a write only stream. If Brook+ compiles it anyway you may get undefined behaviour. Even if you define the kernel with another gather input parameter and use the same stream as both input and output it would could result in race conditions.

          However, looks like you can use the same stream as input and output safely if it is used as pure stream parameters (no gather or scatter). By default aliasing is disabled to prevent programming errors, but you can enable it with environment variable BRT_PERMIT_READ_WRITE_ALIASING.

            • Is it allowed to read from an output stream?

              My understanding is the same as Ceq. The problem ATI has is that if you use more than one stream for the same array (if you ping pong between two streams that are the essentially the same) then you take up twice the memory on the GPU (which is already limited) to do something that should only take up one (as in CUDA). You can use the same stream as an input and output but it's like Ceq says you might have problems with that since the order is not known, etc..

            • Is it allowed to read from an output stream?

              Theoretically yes, but I can't see any practical reason for wanting to do this.  You mess up the communication pattern among streams when you use scatter or gather, and you will for sure take a performance hit.  Anything you can do with streams, you don't want to do it with gather or scatter.  Only in cases where you cannot do want you want with streams should you even contemplate gather/scatter (e.g. implementing local arrays in brook+ or as in the optimized matrix multiply examples in the sdk).