4 Replies Latest reply on Sep 7, 2010 3:42 PM by zeland

    Using ACML-GPU and HD4870x2 to run HPL

    scksz
      why when Linpack NB>458, the ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N) will be FAILED

      Hi, all!

      I want using ACML-GPU lib and HD4870x2 graphical card to run high-performance linpack.

      when NB<=478, ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N) will be equal to zero, this calculation is passed.

      but when N>1000 and NB>478, ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N) will not be equal to zero, the result is wrong, the calculation is failed!

      I want to know why NB cant greater than 458, and how can I increase my linpack benchmark score.

        • Using ACML-GPU and HD4870x2 to run HPL
          chipf

          The 1.1 version of ACML-GPU has a bug that causes this problem.  We have resolved the bug and released as 1.1.1.  Please try this new version, it was released specifically for this problem.

          Using the new version and a 5870 card, best results were obtained with N=22000, and NB=2560.   This system has 8GB of memory.

            • Using ACML-GPU and HD4870x2 to run HPL
              scksz

              Thank you very much! I use acml-gpu 1.1.1, find when NB > 458, the calculation is still right. but the HPL benchmark is lower than theoretical value: when using one HD4870x2 card with 2 GPU, the max benchmark I get only 110Gflops under double precision. How can I increase the HPL benchmark?

                • Using ACML-GPU and HD4870x2 to run HPL
                  zeland

                   

                  Originally posted by: scksz Thank you very much! I use acml-gpu 1.1.1, find when NB > 458, the calculation is still right. but the HPL benchmark is lower than theoretical value: when using one HD4870x2 card with 2 GPU, the max benchmark I get only 110Gflops under double precision. How can I increase the HPL benchmark?

                   

                  I have this problem too. in HPL max FLOPS achieved by my is 60GFLOPS, but if I measure FLOPS on  DGEMM  only for example  in time_dgemm I obtained up to 440GFLOPS  on 5870.

                  My system is 5870 x58 i7 12GB ram driver 10.8 SDK 2.2 ACML_GPU 1.1.1 but libCALBLAS is rebuild.  

                • Using ACML-GPU and HD4870x2 to run HPL
                  zeland

                   


                   

                  Using the new version and a 5870 card, best results were obtained with N=22000, and NB=2560.   This system has 8GB of memory.



                  I'm interesting in  running  of hpl test on ATI GPU. Could you show your hpl.dat file. What results did you obtained?