dns.on.gpu Jul 14, 2016 6:21 AM (in response to alexfd7)The textbook definition of matrixmatrix multiplication that you
are using is most unsuitable for both cpu and gpu calculations
unless the matrices are small.
Do a web search with the terms: tiled matrix multiplication
Implementing a tiled matrixmatrix multiplication on gpus for
matrices of sized other than powers of 2 is complicated
but the gains in performance are impressive.


alexfd7 Jul 14, 2016 11:31 AM (in response to dns.on.gpu)Hello thanks for answering,
My question is regarding the difference in runtime, using the NDRange.
Why I got time results so different? 85 seconds using "NDRange localThreads (256, 1)" and 3 seconds using "NDRange localThreads (16, 16)"
