5 Replies Latest reply on Mar 29, 2012 5:07 PM by chipf

    Memory leak when ACML_FAST_MALLOC is enabled

    ionutg

      Hi,

       

      I just observed a memory leak in the multi-threaded ACML 5.1.0 with FMA4 compiled for Intel Fortran (ifort64_fma4_mp) when ACML_FAST_MALLOC is enabled. The program calls dgemm a few hundred times on a set of 3 matrices (10000x10000). The memory usage increases linearly until the process runs out of memory and is killed with "Insufficient virtual memory". Unsetting ACML_FAST_MALLOC leaves the memory usage flat.

       

      Ionut

        • Re: Memory leak when ACML_FAST_MALLOC is enabled
          chipf

          We'll have to try and reproduce this.  If you have a simple test case it would be helpful.

          Are you multithreading in the application, and then calling DGEMM?  Or is the application running a single master thread that expects DGEMM to multithread?

           

          How many threads?

            • Re: Memory leak when ACML_FAST_MALLOC is enabled
              ionutg

              Hi Chip,

               

              I am actually using a mix of 4 ACML functions. The application is single threaded and takes advantage of the threaded ACML. I use anywhere between 8 and 64 threads, depending on the size of the matrices. The system is a 64-core Interlagos and can actually support 64 threads.

               

              I'll try to reproduce it on a simple example, but until then here is the succession of ACML calls. Please let me know if you see a smoking gun. dsymm operates on large NxN matrices, dgemm on smaller 6xN ones.

              N = 5370 while not_converged:           dsyrk (NxN)

                   dsymv (NxN, N)

                   dsymm (NxN)

                   dsymm (NxN)

                   dgemm (6xN)

                   dgemm (6xN)

               

              Thanks a lot,

              Ionut

              1 of 1 people found this helpful
                • Re: Memory leak when ACML_FAST_MALLOC is enabled
                  chipf

                  I have started a simple test case using gfortran.  DGEMM seems to work as expected, I'll add now cases for dsymm and also for the ifort compiler. 

                   

                  On thing you can try is the ACML_FAST_MALLOC_DEBUG envrionment variable as documented in the user guide.  You might try reducing the number of threads to reduce the volume of messages and to see if that affects the problem.

                  Maybe this will tell us something useful.

                   

                  I'm assuming this is a linux environment?

              • Re: Memory leak when ACML_FAST_MALLOC is enabled
                chipf

                It seems that with the problem sizes you are using dsyrk calls dgemm in a way that sometimes the fast alloc mechanism is effective and sometimes not, and this appears to be causing the leak.

                 

                There is a variable, ACML_FAST_MALLOC_CHUNK_SIZE that controls how much memory is allowed for an allocation that will be retained.  It's set at 10MB, which is too small for this case.  I was using N=6000, and I was able to effectively work around the issue by setting ACML_FAST_MALLOC_CHUNK_SIZE to 35000000.

                I chose the size by grabbing debug output into a file called temp.

                I found the largest allocation requested using:

                grep "new malloc size" temp | sed -e"s/^.*size //" | sort -n

                 

                I just rounded that largest number up a bit and then memory usage no longer showed a leak.  For 32 threads, memory use capped at 1.8G.

                This doesn't resolve the actual bug, but it is an effective work around.  With the smaller size, the fast malloc is essentially not working anyway, so there would be no performance benefit even if the bug is fixed.