I wonder how data is stored in GPU. ie, if I have a 2-D array A on CPU, after using streamRead (Agpu, A), what is the data layout? Should it be:
A[0,0] A[0,1] A[0,2] A[0,3]
A[1,0] A[1,1] A[1,2] A[1,3]
A[2,0] A[2,1] A[2,2] A[2,3]
as in CPU?
Making sure this problem will help me to understand the mat_mul code in user guide, becoz that code gives me the impression that C = A's col * B's row. Thanks in advance.