hi all,

I'm in need of a Sgemm implementation on GPU and I've tried the AMD proposition clAmdBlasSgemm (win7 pro 64bits, clAmdBlas-1.2.78, Radeon HD5770, driver 11.5, sdk 2.4). While comparing results i've notice that things are not really correct. In the sample "example_sgemm.c" you pack with the library there's this C = 10*A*B + 20*C , with:

A[] = {

11, 12, 13, 14, 15,

21, 22, 23, 24, 25,

31, 32, 33, 34, 35,

41, 42, 43, 44, 45}

B[] = {11, 12, 13,

21, 22, 23,

31, 32, 33,

41, 42, 43,

51, 52, 53}

c[] = {

11, 12, 13,

21, 22, 23,

31, 32, 33,

41, 42, 43}

with result

14420, 15790, 17060,

25120, 27490, 29760,

35820, 39190, 42460,

46520, 50890, 55160

while the correct result is:

21370, 22040, 22710,

37070, 38240, 39410,

52770, 54440, 56110,

68470, 70640, 72810

I've also tested some other results, here's a simple one:

A[] = {

1, 2, 3,

1, 2, 3,

1, 2, 3}

B[] = {

1, 1, 1,

2, 2, 2,

3, 3, 3}

with trasA = transB = clAmdBlasNoTrans, M = N = K = lda = ldb = ldc = 3 and alpha = 1, beta = 0, with result

6, 12, 18,

6, 12, 18,

6, 12, 18

while it should be 14 everywhere. Is it that I'm not using it correctly or?

First of all, when you write:

A[] = { series of number }

is that how you enter them in C/C++ or how they are output from a math program like Matlab?

Matrices in BLAS is in column-major format. A matrix like this:

1 2 3

4 5 6

is stored in linear memory like this:

1 4 2 5 3 6

If I have try to multiply your matrices, I get a problem in that the dimensions do not match. This needs to match:

A [m x k] * B [k x n] = C [m x n]

Your A indicates you think the dimension is [4 x 5]. If you enter that to sgemm, your A actually looks like this:

11 15 24 33 42

12 21 25 34 43

13 22 31 35 44

14 23 32 41 45

Also, you last example should not give 14 in every field of the matrix, but it should also not give what you showed.