running simple_matmult with certain matrix sizes returns a "FAILED" at verify
we have an issue with simple_matmult program from the samples package which we cannot explain up to now:
Enabling verification, it turns out, that the results of the matrix-matrix-multiplication by CPU and GPU do not match for certain matrix sizes.
For example, 64x64 and 128x128 always returns a PASSED, while a y-size of 65 or 129 returns a FAILED. This issue does not only depend on one of the sizes: While any x size with y size of 128 returns a PASSED, for y size of 1024 this is not true.