In trying to walk through (rationalize) "simple_matmult" sample code below I noticed "vPos" is declared & initialized and then used only once, in the initialization of "index". Is this use of "indexof(result).xy" an example of "think of the kernel body as being executed on every element in the output stream"?
The kernel code shows arguments "float A, float B" as doubly indexed but their use appears to singularly indexed, "A[index.zw]*B[index.xy]". I assume this is an example of "gather stream" arguments being indexed with a float2 vector?
You got both right.
Note that the comment // A[k] * B[k][j]
should actually read // A[i][k] * B[k][j] . The forum software eats up the [i] s.