AnsweredAssumed Answered

NUMA aware memory heap manager - segmentation fault

Question asked by ajaydarez on Feb 27, 2012

I am a Masters student working on implementing clustering solutions on NUMA aware AMD-opteron machines.  I came across the white paper titled NUMA aware heap managers by Patryk Kaminski ( http://www.google.ca/url?sa=t&rct=j&q=numa%20aware%20heap%20memory%20manager&source=web&cd=1&ved=0CCcQFjAA&url=http%3A%2F%2Fdeveloper.amd.com%2FAssets%2FNUMA_aware_heap_memory_manager_article_final.pdf&ei=fjpMT5G4H8PY0QG1woHNAg&usg=AFQjCNE9DKLawS4_6HgPkyA8WiUFgSsEPA&sig2=N9NUqUvrY-PlgbQUL9xGbw ) .  I was very much interested in it because it might significantly improve the performance of my application.  I am working with an 4-node AMD opteron machine .  I downloaded the code and tried to run it.  But it gives a segmentation fault.  Even the unittests and the benchmarks do not run. 

I tried the following things:

  • Tried latest version of Numactyl(libnuma).
  • Trued version 1.0.2 version of Numactyl.
  • Tried version 1.0.2 version of Numactyl with the patched libnuma.c applied.


None of these seem to work.  Here are the details of the segfault when I use GDB

 

GDB shows the reason of the segfault as follows:

 

TCMalloc_Central_FreeList::FetchFromSpans (this=0x2aaaaaf1de80, node=-1) at src/tcmalloc.cc:2086
2086     src/tcmalloc.cc: No such file or directory.
in src/tcmalloc.cc
Backtrace reveals the following:

 

 

#0  TCMalloc_Central_FreeList::FetchFromSpans (this=0x2aaaaaf1de80, node=-1) at src/tcmalloc.cc:2086
#1  0x00002aaaaacd6043 in TCMalloc_Central_FreeList::FetchFromSpansSafe (this=0x2aaaaaf1de80, node_index=0x7fffffffe624, grow=64) at src/tcmalloc.cc:2072
#2  0x00002aaaaacd611a in TCMalloc_Central_FreeList::RemoveRangeNode (this=0x2aaaaaf1de80, start=0x7fffffffe698, end=0x7fffffffe690, N=<value optimized out>, nodeIndex=-1, grow=64) at src/tcmalloc.cc:2052
#3  0x00002aaaaacd62db in TCMalloc_Central_FreeList::RemoveRange (this=0x2aaaaaf1de80, start=0x7fffffffe698, end=0x7fffffffe690, N=32) at src/tcmalloc.cc:2025
#4  0x00002aaaaacd6364 in TCMalloc_ThreadCache::FetchFromCentralCache (this=0x60c000, cl=1, byte_size=8) at src/tcmalloc.cc:2228
#5  0x00002aaaaacdb607 in Allocate (size=1) at src/tcmalloc.cc:2197
#6  do_malloc (size=1) at src/tcmalloc.cc:2955
#7  malloc (size=1) at src/tcmalloc.cc:3191
#8  0x00002aaaaacd78c5 in TCMallocGuard (__initialize_p=<value optimized out>, __priority=<value optimized out>) at src/tcmalloc.cc:2848
#9  __static_initialization_and_destruction_0 (__initialize_p=<value optimized out>, __priority=<value optimized out>) at src/tcmalloc.cc:2866
#10 0x00002aaaaacdb356 in __do_global_ctors_aux () from /usr/sunkay8/cshome/anandan/Research/Libraries/lib/libtcmalloc_minimal.so.0
#11 0x00002aaaaacd0a1b in _init () from /usr/sunkay8/cshome/anandan/Research/Libraries/lib/libtcmalloc_minimal.so.0
#12 0x00002aaaab97d4c8 in ?? ()
#13 0x00002aaaaaab83eb in call_init () from /lib64/ld-linux-x86-64.so.2
#14 0x00002aaaaaab84f5 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#15 0x00002aaaaaaabaaa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#16 0x0000000000000001 in ?? ()
#17 0x00007fffffffeb66 in ?? ()
#18 0x0000000000000000 in ?? ()
Is there any known issue with the configuration or something else that I am doing wrong that might result in this?
I would also like to know if any work has been done in this after the white-paper and if so can you please direct me as to where I can find the new source code?

 

It would be of great help to my research if you could spare some time and help me out.

Outcomes