It's a "streaming" environment so it has limitations other GPGPU solutions do not have (I don't think OpenCL will have the read/write limitation and that should be out soon, CUDA doesn't have this limitation either).
That said, you can just create two streams: an input stream and an output stream. This is how you "handle" it. This would be the same for any instance where you wanted to input an array (stream) and manipulate that stream and then output it. You just have to be careful, if you don't manipulate every element of the stream in the kernel with the algorithm, then you need to explicitly copy those elements over (or every element at the start of the kernel, I haven't really noticed a difference between the two).
This really only becomes a problem if you start to run out of memory and not even then really considering the max stream size is 8192x8192.