cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

yurtesen
Miniboss

minor problems with clAmdBlasTune

There seems to be some minor problems with clAmdBlasTune

1- Something funny at help:

Used data types:

   --float

       Single float version of functions.

   --complex

       Double float version of functions.

   --double

       Single complex float version of functions.

   --double-complex

       Double complex float version of functions

2- --store-kernel is not functioning?

# clAmdBlasTune --store-kernel

Unknown argument --store-kerne

3- Fails: on AMD Radeon H 6320 Graphics (cliinfo output attached)

# clAmdBlasTune

GEMV is being tuned, progress:  0.07% clEnqueueNDRangeKernel() failed with -5(CL_OUT_OF_RESOURCES

)

0 Likes
1 Solution

Yes, we are able to reproduce this issue, but can't say much else right now.

For your DGEMM performance, you should be able to get around 600 Gflops for a matrix of size 2Kx2K on Tahiti.

I've heard several request for downloadable tuning databases, but it wasn't something we were planning on distributing.  In concept, we would want each individual user to generate their own tuning files for their own computer, as each individual computer has their own configuration.  For now, there are no plans to distribute tuning files, but i will make a note of it in our tracker for possible future consideration.

Kent

View solution in original post

0 Likes
17 Replies
yurtesen
Miniboss

It appears the option is --store-kernels and not --store-kernel but the help text is showing wrong info

0 Likes

Now I think that it does not complete at all... It prints 3.12% and exits back to shell? Any ideas? (the card is a Tahiti card.)

$ /opt/clAmdBlas-1.8.269/bin64/clAmdBlasTune --gemm --double

GEMM is being tuned, progress:  3.12% $

0 Likes

I have been able to run the tuning program on my Tahiti card, and I know that the MAGMA folks from UTK can also.  Are your running the latest drivers?  Catalyst 12.4?  make sure that you are using the OpenCL libraries from the most recent driver, not from any APP SDK that you might have installed on your machine.

0 Likes
kknox
Staff

Hi Yurtesen,

I have fixed issues 1 and 2 in our trunk.

#3 looks to be a bug in our clAmdBlasTune program.  The openCL program will oftentimes delay memory allocation as much as possible, so a clEnqueueNDRange call can actually allocate the buffer and transfer memory, hence the reason for the return code.  Looks like on your small integrated card, clAmdBlasTune is not properly handling the CL_OUT_OF_RESOURCES return code.

Hello, thanks for the quick reply I am not woried about my fusion APU. But it would be nice to get it fixed. I thought clAmdBlas was allocating too much memory. Perhaps allow user to enter a matrix size to? or auto detect available memory and run accordingly large tests? (it would probably sense if  1GB Cypress is running smaller tests compared to 3GB Tahiti).

4- I have different problems now. On Cypress the tests do not seem to end at 100%

$ /opt/clAmdBlas-1.8.269/bin64/clAmdBlasTune

GEMV is being tuned, progress: 100.00% SYMV is being tuned, progress: 50.00% GEMM is being tuned, progress: 21.88% TRMM is being tuned, progress: 23.44% TRSM is being tuned, progress: 100.00% SYRK is being tuned, progress: 50.00% SYR2K is being tuned, progress: 50.00% $

If I run it again, it prints same and exits.

   a)Does this mean the tests were completed or not?

   b)How can I be sure that it uses the best kernel from tests when I run a program next time?

5- The --store-kernels option is causing Cypress to segmentation fault.

$ /opt/clAmdBlas-1.8.269/bin64/clAmdBlasTune --store-kernels

GEMV is being tuned, progress:  6 9.38% Segmentation fault (core dumped)

$

Everytime I run it, the percentage increases a bit but then it crashes. I guess it crashes when trying to store the kernel? (so the operation does not work at all...)

6- Is the tuning program suppose to function only on AMD devices or can the library be tuned to function with Nvidia GPUs also?

0 Likes

4- I have different problems now. On Cypress the tests do not seem to end at 100%

This needs investigation; the progress meter should go to 100% for every test.  I'll make a bug report in our tracker software, but I imagine that the .kdb database that is built contains valid data for the percentages that you did get.

5- The --store-kernels option is causing Cypress to segmentation fault

This needs investigation.  Can you get a log of where it seg faults running under a debugger?

6- Is the tuning program suppose to function only on AMD devices or can the library be tuned to function with Nvidia GPUs also?

In theory, the tuning program should work on all OpenCL devices (except CPU's).  However, we just tried this in our lab today and we got a -1 from clGetDeviceIDs().  This also needs investigation.

Thanks for the feedback.

0 Likes

4- It is sad that this is working so badly. Because I got a 25% speedup in sgemm call, but dgemm and results are exactly the same. Now I am not sure if it was already the best settings or if it is because of not going until 100%

Now I am running the tune program on Tahiti and it does not go until 100% also (but isnt it strange that it stops at 25 and 50 % marks?)

SYMV is being tuned, progress: 50.00% GEMM is being tuned, progress: 25.00% TRMM is being tuned, progress: 25.00%

6-

kknox wrote:

function with Nvidia GPUs also?

In theory, the tuning program should work on all OpenCL devices (except CPU's).  However, we just tried this in our lab today and we got a -1 from clGetDeviceIDs().  This also needs investigation.

I think (and it is a wild guess) that is yet another problem with how you detect the devices,  if the first platform (and I dont know how the systems order this) is a CPU platform then it does give the error you mention because it is not able to find a GPU.

For example if you have a box with AMD SDK and Nvidia SDK is installed but only with an Nvidia card, the AMD SDK can take first place in pllatforms and return only CPUs to tuning program. (therefore -1 on GPU device)

Anyway, I could get it to work on Nvidia also, but it crashes after a while I think I will come back to that after we fix the AMD related problems

5- I could run it under valgrind if it would help but now I am running tuning on tahiti so I have to do that later.. I have 1 card at a time so I can tune for all the cards

0 Likes

What driver and SDK are you using?

For this cypress system, please attach the clinfo output, and i think you are on linux, so go ahead and attach the getatisystemreport file as well.

0 Likes

I will retrun back to you with that info when I return back to Cypress and if Tahiti segmentation faults too, then I can return back with that..

0 Likes

clinfo from tahiti is attached. I am on Linux and I vaguely remember this atisysteminfo script which collected information somehow but I couldnt find it anywhere, do you know if it still exists and where?

APP SDK 2.7 and Catalyst 12.4 is used on this system...

Problem 4 on Tahiti as well. Are you saying that when you run this on your cards it goes until 100%?

$ /opt/clAmdBlas-1.8.269/bin64/clAmdBlasTune

GEMV is being tuned, progress: 100.00% SYMV is being tuned, progress: 50.00% GEMM is being tuned, progress: 25.00% TRMM is being tuned, progress: 25.00% TRSM is being t

is being tuned, progress: 50.00% SYR2K is being tuned, progress: 50.00% $

Problem 5 also on Tahiti

I have attached the valgrind output (I re-ran with valgrind). IF you want, I can also do the same with Cypress (if you think it might be a different error with Cypress?)

$ /opt/clAmdBlas-1.8.269/bin64/clAmdBlasTune --store-kernels

GEMV is being tuned, progress:  3.12% Segmentation fault (core dumped)

$

Do you need any other info?

0 Likes

Well, it would at least be nice if tune could reach 100%. I am trying to do benchmarks but it is difficut since this does not seem to be functioning, anything new about this issues? Do you see them too?

0 Likes

Hi yurtesen~

Sorry that you have not heard from me; we are investigating, but have nothing yet to report.

In the meantime, if you download the .2 version of clMAGMA:

http://icl.cs.utk.edu/magma/software/index.html

They include a database for Tahiti included in the tarball, which i believe ran to completion.  Try that and see if it improves your DGEMM score.  I will let you know, that DGEMM has a lot less 'wiggle' room, as it's much easier to saturate the compute resources. 

Kent

0 Likes

I will try that, but are you able to run a complete clAmdBlasTune session on your systems? I tried on several machines and it failed one way or another on all of them. That makes me wonder, how could those guys at utk.edu could complete it?

Also, why AMD does not give downloadable profiles for these libraries, tuned for architectures like Tahiti etc. (does it make sense for us to run tuning ourselves?) same way they have game profiles?

0 Likes

Yes, we are able to reproduce this issue, but can't say much else right now.

For your DGEMM performance, you should be able to get around 600 Gflops for a matrix of size 2Kx2K on Tahiti.

I've heard several request for downloadable tuning databases, but it wasn't something we were planning on distributing.  In concept, we would want each individual user to generate their own tuning files for their own computer, as each individual computer has their own configuration.  For now, there are no plans to distribute tuning files, but i will make a note of it in our tracker for possible future consideration.

Kent

0 Likes

kknox wrote:

Yes, we are able to reproduce this issue, but can't say much else right now.

Well, I guess I will wait for an updated version. I would gladly test it when it is out This sort of makes me think that clMagma is probably distributing an incomplete database...

0 Likes

Well,  I trid the kdb from clmagma and it gives exactly the same results. But then, I only have benchmark program for DGEMM and SGEMM at this point.

I realized that clAmdBlasTune automatically resumes from where it left if restarted. Is it possiblle to make a program which tells how of the optimizations were completed by reading the kdb file? (that could sometimes come handy

0 Likes

I realized that clAmdBlasTune automatically resumes from where it left if restarted. Is it possiblle to make a program which tells how of the optimizations were completed by reading the kdb file? (that could sometimes come handy

No, not at this time.

0 Likes