I have some concepth confused by the SDK
1. what's the difference between global buffer, input buffer and output buffer in hardware? And what is the principle to allocate stream, gather stream and scatter stream to these space?
2. Is scatter stream access always uncached? I found using the scatter stream could drop the performance to 20% compared to using the normal stream.
3. Does SDK1.2 support multiple different output stream domain? like this,
kernelName(input1, output1.domain(x,x), output2.domain(y,y))
(which x !=y). In previous SDK1.1 it is a bug. Have it been resolved?