17 Replies Latest reply on Jun 4, 2012 4:32 PM by yurtesen

    minor problems with clAmdBlasTune

    yurtesen

      There seems to be some minor problems with clAmdBlasTune

       

       

      1- Something funny at help:

      Used data types:

         --float

             Single float version of functions.

         --complex

             Double float version of functions.

         --double

             Single complex float version of functions.

         --double-complex

             Double complex float version of functions

       

      2- --store-kernel is not functioning?

      # clAmdBlasTune --store-kernel

      Unknown argument --store-kerne

       

      3- Fails: on AMD Radeon H 6320 Graphics (cliinfo output attached)

      # clAmdBlasTune

      GEMV is being tuned, progress:  0.07% clEnqueueNDRangeKernel() failed with -5(CL_OUT_OF_RESOURCES

      )

        • Re: minor problems with clAmdBlasTune
          yurtesen

          It appears the option is --store-kernels and not --store-kernel but the help text is showing wrong info

          • Re: minor problems with clAmdBlasTune
            kknox

            Hi Yurtesen,

             

            I have fixed issues 1 and 2 in our trunk.

             

            #3 looks to be a bug in our clAmdBlasTune program.  The openCL program will oftentimes delay memory allocation as much as possible, so a clEnqueueNDRange call can actually allocate the buffer and transfer memory, hence the reason for the return code.  Looks like on your small integrated card, clAmdBlasTune is not properly handling the CL_OUT_OF_RESOURCES return code.

            1 of 1 people found this helpful
              • Re: minor problems with clAmdBlasTune
                yurtesen

                Hello, thanks for the quick reply I am not woried about my fusion APU. But it would be nice to get it fixed. I thought clAmdBlas was allocating too much memory. Perhaps allow user to enter a matrix size to? or auto detect available memory and run accordingly large tests? (it would probably sense if  1GB Cypress is running smaller tests compared to 3GB Tahiti).

                 

                4- I have different problems now. On Cypress the tests do not seem to end at 100%

                $ /opt/clAmdBlas-1.8.269/bin64/clAmdBlasTune

                GEMV is being tuned, progress: 100.00% SYMV is being tuned, progress: 50.00% GEMM is being tuned, progress: 21.88% TRMM is being tuned, progress: 23.44% TRSM is being tuned, progress: 100.00% SYRK is being tuned, progress: 50.00% SYR2K is being tuned, progress: 50.00% $

                If I run it again, it prints same and exits.

                   a)Does this mean the tests were completed or not?

                   b)How can I be sure that it uses the best kernel from tests when I run a program next time?

                 

                5- The --store-kernels option is causing Cypress to segmentation fault.

                $ /opt/clAmdBlas-1.8.269/bin64/clAmdBlasTune --store-kernels

                GEMV is being tuned, progress:  6 9.38% Segmentation fault (core dumped)

                $

                Everytime I run it, the percentage increases a bit but then it crashes. I guess it crashes when trying to store the kernel? (so the operation does not work at all...)

                 

                6- Is the tuning program suppose to function only on AMD devices or can the library be tuned to function with Nvidia GPUs also?

                  • Re: minor problems with clAmdBlasTune
                    kknox

                    4- I have different problems now. On Cypress the tests do not seem to end at 100%

                    This needs investigation; the progress meter should go to 100% for every test.  I'll make a bug report in our tracker software, but I imagine that the .kdb database that is built contains valid data for the percentages that you did get.

                    5- The --store-kernels option is causing Cypress to segmentation fault

                    This needs investigation.  Can you get a log of where it seg faults running under a debugger?

                    6- Is the tuning program suppose to function only on AMD devices or can the library be tuned to function with Nvidia GPUs also?

                    In theory, the tuning program should work on all OpenCL devices (except CPU's).  However, we just tried this in our lab today and we got a -1 from clGetDeviceIDs().  This also needs investigation.

                     

                    Thanks for the feedback.

                      • Re: minor problems with clAmdBlasTune
                        yurtesen

                        4- It is sad that this is working so badly. Because I got a 25% speedup in sgemm call, but dgemm and results are exactly the same. Now I am not sure if it was already the best settings or if it is because of not going until 100%

                         

                        Now I am running the tune program on Tahiti and it does not go until 100% also (but isnt it strange that it stops at 25 and 50 % marks?)

                         

                        SYMV is being tuned, progress: 50.00% GEMM is being tuned, progress: 25.00% TRMM is being tuned, progress: 25.00%

                         

                         

                        6-

                        kknox wrote:

                        function with Nvidia GPUs also?

                        In theory, the tuning program should work on all OpenCL devices (except CPU's).  However, we just tried this in our lab today and we got a -1 from clGetDeviceIDs().  This also needs investigation.

                        I think (and it is a wild guess) that is yet another problem with how you detect the devices,  if the first platform (and I dont know how the systems order this) is a CPU platform then it does give the error you mention because it is not able to find a GPU.

                         

                        For example if you have a box with AMD SDK and Nvidia SDK is installed but only with an Nvidia card, the AMD SDK can take first place in pllatforms and return only CPUs to tuning program. (therefore -1 on GPU device)

                         

                        Anyway, I could get it to work on Nvidia also, but it crashes after a while I think I will come back to that after we fix the AMD related problems

                         

                        5- I could run it under valgrind if it would help but now I am running tuning on tahiti so I have to do that later.. I have 1 card at a time so I can tune for all the cards

                      • Re: minor problems with clAmdBlasTune
                        kknox

                        What driver and SDK are you using?

                         

                        For this cypress system, please attach the clinfo output, and i think you are on linux, so go ahead and attach the getatisystemreport file as well.

                          • Re: minor problems with clAmdBlasTune
                            yurtesen

                            I will retrun back to you with that info when I return back to Cypress and if Tahiti segmentation faults too, then I can return back with that..

                            • Re: minor problems with clAmdBlasTune
                              yurtesen

                              clinfo from tahiti is attached. I am on Linux and I vaguely remember this atisysteminfo script which collected information somehow but I couldnt find it anywhere, do you know if it still exists and where?

                               

                              APP SDK 2.7 and Catalyst 12.4 is used on this system...

                               

                              Problem 4 on Tahiti as well. Are you saying that when you run this on your cards it goes until 100%?

                              $ /opt/clAmdBlas-1.8.269/bin64/clAmdBlasTune

                              GEMV is being tuned, progress: 100.00% SYMV is being tuned, progress: 50.00% GEMM is being tuned, progress: 25.00% TRMM is being tuned, progress: 25.00% TRSM is being t

                              is being tuned, progress: 50.00% SYR2K is being tuned, progress: 50.00% $

                               

                              Problem 5 also on Tahiti

                              I have attached the valgrind output (I re-ran with valgrind). IF you want, I can also do the same with Cypress (if you think it might be a different error with Cypress?)

                              $ /opt/clAmdBlas-1.8.269/bin64/clAmdBlasTune --store-kernels

                              GEMV is being tuned, progress:  3.12% Segmentation fault (core dumped)

                              $

                              Do you need any other info?

                                • Re: minor problems with clAmdBlasTune
                                  yurtesen

                                  Well, it would at least be nice if tune could reach 100%. I am trying to do benchmarks but it is difficut since this does not seem to be functioning, anything new about this issues? Do you see them too?

                                    • Re: minor problems with clAmdBlasTune
                                      kknox

                                      Hi yurtesen~

                                       

                                      Sorry that you have not heard from me; we are investigating, but have nothing yet to report.

                                       

                                      In the meantime, if you download the .2 version of clMAGMA:

                                      http://icl.cs.utk.edu/magma/software/index.html

                                       

                                      They include a database for Tahiti included in the tarball, which i believe ran to completion.  Try that and see if it improves your DGEMM score.  I will let you know, that DGEMM has a lot less 'wiggle' room, as it's much easier to saturate the compute resources. 

                                       

                                      Kent

                                        • Re: minor problems with clAmdBlasTune
                                          yurtesen

                                          I will try that, but are you able to run a complete clAmdBlasTune session on your systems? I tried on several machines and it failed one way or another on all of them. That makes me wonder, how could those guys at utk.edu could complete it?

                                           

                                          Also, why AMD does not give downloadable profiles for these libraries, tuned for architectures like Tahiti etc. (does it make sense for us to run tuning ourselves?) same way they have game profiles?

                                            • Re: minor problems with clAmdBlasTune
                                              kknox

                                              Yes, we are able to reproduce this issue, but can't say much else right now.

                                               

                                              For your DGEMM performance, you should be able to get around 600 Gflops for a matrix of size 2Kx2K on Tahiti.

                                               

                                              I've heard several request for downloadable tuning databases, but it wasn't something we were planning on distributing.  In concept, we would want each individual user to generate their own tuning files for their own computer, as each individual computer has their own configuration.  For now, there are no plans to distribute tuning files, but i will make a note of it in our tracker for possible future consideration.

                                               

                                              Kent

                                                • Re: minor problems with clAmdBlasTune
                                                  yurtesen

                                                  kknox wrote:

                                                   

                                                  Yes, we are able to reproduce this issue, but can't say much else right now.

                                                   

                                                  Well, I guess I will wait for an updated version. I would gladly test it when it is out This sort of makes me think that clMagma is probably distributing an incomplete database...

                                              • Re: minor problems with clAmdBlasTune
                                                yurtesen

                                                Well,  I trid the kdb from clmagma and it gives exactly the same results. But then, I only have benchmark program for DGEMM and SGEMM at this point.

                                                 

                                                I realized that clAmdBlasTune automatically resumes from where it left if restarted. Is it possiblle to make a program which tells how of the optimizations were completed by reading the kdb file? (that could sometimes come handy