0 Replies Latest reply on Feb 27, 2012 8:23 PM by ajaydarez

    NUMA aware memory heap manager - segmentation fault

    ajaydarez

      I am a Masters student working on implementing clustering solutions on NUMA aware AMD-opteron machines.  I came across the white paper titled NUMA aware heap managers by Patryk Kaminski ( http://www.google.ca/url?sa=t&rct=j&q=numa%20aware%20heap%20memory%20manager&source=web&cd=1&ved=0CCcQFjAA&url=http%3A%2F%2Fdeveloper.amd.com%2FAssets%2FNUMA_aware_heap_memory_manager_article_final.pdf&ei=fjpMT5G4H8PY0QG1woHNAg&usg=AFQjCNE9DKLawS4_6HgPkyA8WiUFgSsEPA&sig2=N9NUqUvrY-PlgbQUL9xGbw ) .  I was very much interested in it because it might significantly improve the performance of my application.  I am working with an 4-node AMD opteron machine .  I downloaded the code and tried to run it.  But it gives a segmentation fault.  Even the unittests and the benchmarks do not run. 

      I tried the following things:

      • Tried latest version of Numactyl(libnuma).
      • Trued version 1.0.2 version of Numactyl.
      • Tried version 1.0.2 version of Numactyl with the patched libnuma.c applied.


      None of these seem to work.  Here are the details of the segfault when I use GDB

       

      GDB shows the reason of the segfault as follows:

       

      TCMalloc_Central_FreeList::FetchFromSpans (this=0x2aaaaaf1de80, node=-1) at src/tcmalloc.cc:2086
      2086     src/tcmalloc.cc: No such file or directory.
      in src/tcmalloc.cc
      Backtrace reveals the following:

       

       

      #0  TCMalloc_Central_FreeList::FetchFromSpans (this=0x2aaaaaf1de80, node=-1) at src/tcmalloc.cc:2086
      #1  0x00002aaaaacd6043 in TCMalloc_Central_FreeList::FetchFromSpansSafe (this=0x2aaaaaf1de80, node_index=0x7fffffffe624, grow=64) at src/tcmalloc.cc:2072
      #2  0x00002aaaaacd611a in TCMalloc_Central_FreeList::RemoveRangeNode (this=0x2aaaaaf1de80, start=0x7fffffffe698, end=0x7fffffffe690, N=<value optimized out>, nodeIndex=-1, grow=64) at src/tcmalloc.cc:2052
      #3  0x00002aaaaacd62db in TCMalloc_Central_FreeList::RemoveRange (this=0x2aaaaaf1de80, start=0x7fffffffe698, end=0x7fffffffe690, N=32) at src/tcmalloc.cc:2025
      #4  0x00002aaaaacd6364 in TCMalloc_ThreadCache::FetchFromCentralCache (this=0x60c000, cl=1, byte_size=8) at src/tcmalloc.cc:2228
      #5  0x00002aaaaacdb607 in Allocate (size=1) at src/tcmalloc.cc:2197
      #6  do_malloc (size=1) at src/tcmalloc.cc:2955
      #7  malloc (size=1) at src/tcmalloc.cc:3191
      #8  0x00002aaaaacd78c5 in TCMallocGuard (__initialize_p=<value optimized out>, __priority=<value optimized out>) at src/tcmalloc.cc:2848
      #9  __static_initialization_and_destruction_0 (__initialize_p=<value optimized out>, __priority=<value optimized out>) at src/tcmalloc.cc:2866
      #10 0x00002aaaaacdb356 in __do_global_ctors_aux () from /usr/sunkay8/cshome/anandan/Research/Libraries/lib/libtcmalloc_minimal.so.0
      #11 0x00002aaaaacd0a1b in _init () from /usr/sunkay8/cshome/anandan/Research/Libraries/lib/libtcmalloc_minimal.so.0
      #12 0x00002aaaab97d4c8 in ?? ()
      #13 0x00002aaaaaab83eb in call_init () from /lib64/ld-linux-x86-64.so.2
      #14 0x00002aaaaaab84f5 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
      #15 0x00002aaaaaaabaaa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
      #16 0x0000000000000001 in ?? ()
      #17 0x00007fffffffeb66 in ?? ()
      #18 0x0000000000000000 in ?? ()
      Is there any known issue with the configuration or something else that I am doing wrong that might result in this?
      I would also like to know if any work has been done in this after the white-paper and if so can you please direct me as to where I can find the new source code?

       

      It would be of great help to my research if you could spare some time and help me out.