Can a kernel use two output streams: one is regular<>, the other is scatter[]?
Only one scatter stream can be used in a kernel due to hardware limitation.
But, you can write to multiple places in the same stream. That allows you to emulate multiple scatter+output streams using a single scatter stream(data that you were writing to regular output stream, write it to first half of the scatter stream and use another half as a different scatter stream).
could you please tell us the situation where combining scatter and reqular streams as outputs is useful?
What algo or idea forced you to use this combination?
One example is to build a neighoring table. The index array can be regular stream, the list ponited by the index is scatter.
Another question: can a scatter stream be 2/3D?
Anyway, For a gerneral-purpose programming, I don't think it is a good idea to assume that a function will never be used and put much burden on user. We do not see any such limitation using CUDA (Don't argue that such way leads to poor performance).
Another question: can a scatter stream be 2/3D?
Yes.