This content has been marked as final. Show 3 replies
I agree that this is rather limiting. It would be great if you could specify an index input stream such that the size of the index stream would be the thread size and that all threads would run over that stream, not the output stream (for indexof purposes).
So in that example, "a"'s index is whatever the index is of c, not of itself since the domain of execution runs over c, correct?
So if C was running 100 threads, then for each kernel call from 0 to 99, it would be a to a, respectively, right?
Does this present a problem when having an output stream larger than the input stream and trying to go from the input stream to the output stream such that you want the output streams index of assignment to be much larger than the input's index.
For example, in CPU code:
out[i+128*j+128*128*k] = in[i+128*j];