we have an issue with simple_matmult program from the samples package which we cannot explain up to now:
Enabling verification, it turns out, that the results of the matrix-matrix-multiplication by CPU and GPU do not match for certain matrix sizes.
For example, 64x64 and 128x128 always returns a PASSED, while a y-size of 65 or 129 returns a FAILED. This issue does not only depend on one of the sizes: While any x size with y size of 128 returns a PASSED, for y size of 1024 this is not true.
Have you any clue, from what this could arise?
Thanks for hints and answers,