cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

dmeiser
Elite

optimization of clAmdBlas for tahiti

Hi,

I've run a benchmark that does dense matrix-matrix multiplication (dgemm operation in blas3) in double precision on a radeon 7970 gpu.  When I use the dgemm function provided in clAmdBlas I measure about 150GFLOP/s.  When I run the same benchmark using ViennaCL I get about 220GLOP/s, i.e. it's significantly faster. Could this be an issue of clAmdBlas not being tuned for tahiti yet? As far as I can tell the kernels in clAmdBlas are precompiled into the so file.  Is it possible that the compiler wasn't as tweaked when the so file was generated (December based on clAmdBlas release date) as it is now?ViennaCL compiles its kernel from source.

Another reason why I suspect this could be a compiler optimization issue is that in single precision I get slightly better performance on a 6870 as I do on the 7970.

Shouldn't I expect a significant fraction of the theoretical peak flops in both double precision and single precision for this ALU bound computation?

Thanks a lot in advance.

Dominic

0 Likes
1 Solution
solver
Adept I

Are you talking about DGEMM kernel performance or performance measured at the CPU side including data

transfer between CPU and GPU?.  I assume you talked about DGEMM kernel performance.

The DGEMM kernel doesn't work well with the current driver (8.921) which is available

at AMD public web site.  AMD will release a new driver soon. DGEMM runs much faster (2.5X)

against this new driver.

View solution in original post

0 Likes
8 Replies

Thanks for your post, Dominic. One of our BLAS engineers has taken a look at your post. We'll respond soon.

0 Likes

Bragadeesh,

Thanks for your response.  In case it helps I have also tried a radeon 5970 which gives very similar performance as the other two cards (7970 and 6870) in single precision.

Looking forward to an answer from the BLAS engineers.

Cheers,

Dominic

0 Likes

Are you talking about TN variant?

0 Likes

Dear sarnath,

I'm talking about the NN kernel.  I did try all other combinations (TN, NT, TT) and got similar results.

Cheers,

Dominic

0 Likes
solver
Adept I

Are you talking about DGEMM kernel performance or performance measured at the CPU side including data

transfer between CPU and GPU?.  I assume you talked about DGEMM kernel performance.

The DGEMM kernel doesn't work well with the current driver (8.921) which is available

at AMD public web site.  AMD will release a new driver soon. DGEMM runs much faster (2.5X)

against this new driver.

0 Likes

Dear solver,

Thanks a lot for your answer.

I was comparing just kernel performance.  All buffers are transferred to the GPU before the kernel is launched.  Once the new driver is available will I have to update clAmdBlas or should dgemm just be faster automatically with the clAmdBlas library I'm currently using? Any estimate of when this new driver is going to be available?

Thanks again.

Dominic

0 Likes

The new driver will probably be available on Feb. 29. 

You may improve the DGEMM performance on the current driver through running

clAmdBlasTune under direcotory src/tools/tune.

Assume you are using bash, first type the following command on the termial:

export AMD_CLBLAS_STORAGE_PATH=your_kdb_directory

And then, type 'clAmdBlasTune --gemm --double --store-kernel' on the terminal.

clAmdBlas will tune the parameters and dump data to a file named Tahiti.kdb located

in the directory 'your_kdb_directory'.

0 Likes
technomeet
Journeyman III

Can u explain me in brief ?m new here.

0 Likes