7 Replies Latest reply on Apr 14, 2009 5:50 PM by twilkens

    ACML on intel processors


      This is a simple question, will the libraries work on intel processors, or are they only for AMD processors?


        • ACML on intel processors

          Yes, ACML should work on intel processors, and we try to keep performance competitive to alternative solutions.

          If you ever find a case where it isn't, we would like to know about it.  Bug submissions and service requests can be posted at http://support.amd.com/consumer

            • ACML on intel processors
              There is a performance issue with the Level 3 BLAS routines when using the new ACML 4.2.0 on Intel processors. Performance is poor, about 50% of machine peak.

              The problem has been identified, and it will be resolved in the next release of ACML.
            • ACML on intel processors

              it seem there is some trouble!

              I do a zfft1D,but the COMPAQ Fortran compilor can not find libacml_dll.dll. However, this file is just in that directory

                • ACML on intel processors

                  Question though,

                  For example acml-4.1.0 that comes with PGI-7.2, is compiled with:


                  -tp x64,barcelona-64

                  Which should make valid intel,amd and barcelona code. Why is it not also compiled with core2-64? or nahalem-64?  The unified binary potions of the PGI compilers should make this easy.


                  Or does most of ACML's performance come from assembley routines? Thus what CPU the code is optimized for not matter so much, just as long as the compiler does not produce SIGILL's  (As I have had happen before, but not in ACML).



                    • ACML on intel processors
                      For the BLAS and FFT functions, the significant performance comes from assembly routines. The new DGEMM assembly kernel is not optimal for Intel machines.

                      The PGI processor flags were arrived at by testing on various platforms. They offered a reasonable compromise between performance and library image size, for the functions where FORTRAN is the key performance driver.
                        • ACML on intel processors

                          I found something interesting.  I did a simple test of acml 4.2.0 FFT performance on an Intel Linux box (Ubuntu 8.10, Q9450) and another one using AMD, same Ubuntu but BE2300.  I understand the Q9450 has a higher clock speed than the BE2300 (2.66 vs. 1.9 GHz).  However, the speed difference are much larger (7 seconds vs. 16 seconds).  Can someone tell me why?  I remember AMD always has better floating point performance than Intel in older days.  Did things change so much?

                          Interesting enough, FFTw on the intel machine is much slower, about 15 seconds.

                          The program did 204800 forward and reverse FFTs with 2048 points each.

                            • ACML on intel processors


                                  Were the FFTs you were running single or double-precision.  The FFT codes should run rather effectively upon the 128-bit FPU enabled Intel chips.


                                  In the near term single and double precision real BLAS routines, and LAPACK as well, will see substantial performance gains upon Woodcrest, Penryn and iCore processors from Intel.  The DGEMM performance upon these asymptotically will approach:


                              Woodcrest: ~91%

                              Penryn: ~94-95%

                              iCore: ~+95%


                              Dr. Tim Wilkens