9 Replies Latest reply on Nov 25, 2015 4:47 AM by leonmaxx

    Memory corruption bug in AMD Catalyst 14.12 (fglrx-14.501)

    leonmaxx

      Hello.

       

      I found a memory corruption bug in new Catalyst release. Here are some backtraces:

       

      Fatal Error: Memory access error at 0x000000000049C463, fault address is 0x00000000005D38E0.
      
      Register dump:
      RAX     = 00000000005D38E0  RBX     = 00000000013F6540  
      RCX     = 00000000000060B0  RDX     = 0000000001005EA8  
      RSI     = 0000000000000000  RDI     = 00000000013F5CC0  
      RSP     = 00007FFFE2F03900  RBP     = 00007FFFE2F03C10  
      R8      = 00000000013B9200  R9      = 00000000013F6520  
      R10     = 00007FE186EA77B8  R11     = 00007FE1840A9470  
      R12     = 0000000000EA15C0  R13     = 00007FE18584F140  
      R14     = 00007FE185881140  R15     = 00007FE185881178  
      RIP     = 000000000049C463  EFLAGS  = 0000000000010202  
      
      
      Stack trace:
      0: 0000000000415E03,  /home/maxx/Projects/git/m3/build/Linux-amd64-clang/debug/landscape_demo() [0x415e03]
      1: 00000000004160D6,  /home/maxx/Projects/git/m3/build/Linux-amd64-clang/debug/landscape_demo() [0x4160d6]
      2: 00007FE187E28340,  /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7fe187e28340]
      3: 000000000049C463,  /home/maxx/Projects/git/m3/build/Linux-amd64-clang/debug/landscape_demo() [0x49c463]
      4: 000000000049CF43,  /home/maxx/Projects/git/m3/build/Linux-amd64-clang/debug/landscape_demo() [0x49cf43]
      5: 0000000000427C15,  /home/maxx/Projects/git/m3/build/Linux-amd64-clang/debug/landscape_demo(_ZdlPv+0x15) [0x427c15]
      6: 00007FE1840A734D,  /usr/lib/dri/fglrx_dri.so(+0xe7734d) [0x7fe1840a734d]
      7: 00007FE183E2CFF4,  /usr/lib/dri/fglrx_dri.so(+0xbfcff4) [0x7fe183e2cff4]
      8: 00007FE183517655,  /usr/lib/dri/fglrx_dri.so(+0x2e7655) [0x7fe183517655]
      9: 00007FE18365ED52,  /usr/lib/dri/fglrx_dri.so(+0x42ed52) [0x7fe18365ed52]
      10: 00007FE184345301,  /usr/lib/dri/fglrx_dri.so(+0x1115301) [0x7fe184345301]
      11: 00007FE184359CF8,  /usr/lib/dri/fglrx_dri.so(+0x1129cf8) [0x7fe184359cf8]
      12: 00007FE184332656,  /usr/lib/dri/fglrx_dri.so(+0x1102656) [0x7fe184332656]
      13: 00007FE184342CEB,  /usr/lib/dri/fglrx_dri.so(+0x1112ceb) [0x7fe184342ceb]
      14: 00007FE184342DD6,  /usr/lib/dri/fglrx_dri.so(+0x1112dd6) [0x7fe184342dd6]
      15: 00007FE18506E804,  /usr/lib/dri/fglrx_dri.so(+0x1e3e804) [0x7fe18506e804]
      

       

      and:

       

      #0  __GI___pthread_mutex_lock (mutex=0xb53800) at ../nptl/pthread_mutex_lock.c:66
      #1  0x00000000004a9725 in je_malloc_mutex_lock (mutex=0x3020108) at ../../src/jemalloc/internal/mutex.h:77
      #2  0x00000000004ac987 in je_arena_dalloc_large (arena=0x3020100, chunk=0x800000, ptr=0xa39f30) at arena.c:1978
      #3  0x000000000049cde0 in je_arena_dalloc (arena=0x3020100, chunk=0x800000, ptr=0xa39f30, try_tcache=true) at ../../src/jemalloc/internal/arena.h:1056
      #4  je_idalloct (ptr=<optimized out>, try_tcache=<optimized out>, ptr=<optimized out>, try_tcache=<optimized out>) at ../../src/jemalloc/internal/jemalloc_internal.h:908
      #5  je_iqalloct (ptr=0xa39f30, try_tcache=true) at ../../src/jemalloc/internal/jemalloc_internal.h:927
      #6  je_iqalloc (ptr=0xa39f30) at ../../src/jemalloc/internal/jemalloc_internal.h:934
      #7  ifree (ptr=0xa39f30) at jemalloc.c:1236
      #8  0x000000000049cf43 in je_free (ptr=0xa39f30) at jemalloc.c:1311
      #9  0x0000000000427c15 in operator delete (pBlock=0xa39f30) at tl/new.cpp:26
      #10 0x00007ffff38a739b in ?? () from /usr/lib/dri/fglrx_dri.so
      #11 0x00007ffff362cff4 in ?? () from /usr/lib/dri/fglrx_dri.so
      #12 0x00007ffff2d17655 in ?? () from /usr/lib/dri/fglrx_dri.so
      #13 0x00007ffff2e5ed52 in ?? () from /usr/lib/dri/fglrx_dri.so
      #14 0x00007ffff3b45301 in ?? () from /usr/lib/dri/fglrx_dri.so
      #15 0x00007ffff3b59cf8 in ?? () from /usr/lib/dri/fglrx_dri.so
      #16 0x00007ffff3b32656 in ?? () from /usr/lib/dri/fglrx_dri.so
      #17 0x00007ffff3b42ceb in ?? () from /usr/lib/dri/fglrx_dri.so
      #18 0x00007ffff3b42dd6 in ?? () from /usr/lib/dri/fglrx_dri.so
      #19 0x00007ffff486e804 in ?? () from /usr/lib/dri/fglrx_dri.so
      #20 0x00007ffff70c3019 in ?? () from /usr/lib/fglrx/libGL.so.1
      #21 0x00007ffff70c305a in ?? () from /usr/lib/fglrx/libGL.so.1
      #22 0x00007ffff749b5d5 in _XFreeExtData () from /usr/lib/x86_64-linux-gnu/libX11.so.6
      #23 0x00007ffff74a76b0 in _XFreeDisplayStructure () from /usr/lib/x86_64-linux-gnu/libX11.so.6
      #24 0x00007ffff74954ef in XCloseDisplay () from /usr/lib/x86_64-linux-gnu/libX11.so.6
      #25 0x0000000000416a57 in m3::Private::X11Display::freeX11 (this=0x7ffff5017080) at linux/x11/x11_display.cpp:106
      #26 0x000000000041695a in m3::Private::X11Display::~X11Display (this=0x7ffff5017080) at linux/x11/x11_display.cpp:32
      #27 0x0000000000416919 in m3::Private::X11Display::~X11Display (this=0x7ffff5017080) at linux/x11/x11_display.cpp:31
      #28 0x0000000000410764 in m3::Display::~Display (this=0x7fffffffdcb8) at display.cpp:38
      #29 0x0000000000404445 in main (nArgc=1, ppArgv=0x7fffffffddb8) at landscape.cpp:85
      

       

      As You can see from traces fglrx_dri.so tries to free corrupted memory (using operator delete[]), inside the call to XCloseDisplay():

       

      8: 00007FE183517655,  /usr/lib/dri/fglrx_dri.so(+0x2e7655) [0x7fe183517655] <- here is a call to operator delete(void*)
      

       

      This call to "operator delete()" tries to free corrupted memory.

      I checked my code for errors multiple times, and found nothing.

      When checking with Valgrind there are multiple reports of invalid writes outside of allocated memory or writes to already free'ed memory.

       

      I also checked my executable on other drivers/OSes, and it worked correctly (checked on Windows/Catalyst, Linux/r600g, Linux/i965, Linux/Nvidia) - no memory corruption problems were found.

       

      Sorry for my bad English.

      BR, Leon.

        • Re: Memory corruption bug in AMD Catalyst 14.12 (fglrx-14.501)
          leonmaxx

          UPDATE1:

          Forgot to say that this bug occurs when I close my application and it calls XCloseDisplay().

          I use Ubuntu 14.04, test PC have following configuration:

          CPU: AMD FX-8350

          M/B: ASUS M5A99X Evo

          RAM: 32GB AMD Radeon Memory (8GBx4)

          V/A: ASUS Radeon HD R9 280 Strix 3GB

           

          UPDATE2: AMD Catalyst 14.9 (fglrx-14.301) does not have this bug, everything worked correctly.

          1 of 1 people found this helpful
          • Re: Memory corruption bug in AMD Catalyst 14.12 (fglrx-14.501)
            flymjj

            Hi Leon,

            Thanks for the reporting.

            I am investigating the issue now.

            From what I have seen, you are trying to overload the global new&delete and use jemalloc underlying.

            Could you let me know what version of jemalloc are you using since the backtrace shows pretty huge difference compared to the version I have?

            BTW: Though it is allowed by the spec but I don't think it is a good idea to hook the new&delete in general.

            Thanks.

            Best Regards,

            David

              • Re: Memory corruption bug in AMD Catalyst 14.12 (fglrx-14.501)
                leonmaxx

                Hi David,

                Sorry for not replying for so long.

                This is a jemalloc-3.6.0, latest from this site: jemalloc.

                Further debugging of this problem showed me that, most likely, driver trying to de-allocate memory which it never allocated, i.e. it could be static or constant data block.

                BTW: Catalyst 15.3 beta for Ubuntu 15.04 (fglrx-15.20.1013) still have this same bug.

                Thanks.

                Best Regards, Leon.

                • Re: Re: Memory corruption bug in AMD Catalyst 14.12 (fglrx-14.501)
                  leonmaxx

                  Hi David,

                  I've investigated this problem further. Now I'm sure that driver tries to free memory that it does not allocated.

                  I've extended function je_free to check if memory block that is free'ed is actually allocated:

                  /* Add this block to je_free() or ifree() in jemalloc.c */
                  size_t usize = ivsalloc(ptr, config_prof);
                  if (usize == 0) {
                    // memory does not allocated.
                    dlogc_printf(_LL_ERROR, 1, __FILE__, __LINE__,
                      "Free\'ed pointer 0x%p does not allocated.\n",
                      ptr);
                    return;
                  }
                  

                   

                  This modification ignores pointers that is not allocated and logs them:

                  jemalloc.c(1249): Free'ed pointer 0x00000000028AF9A0 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x00000000028B2870 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x0000000002A02710 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x00000000028AF7E0 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x00000000028B0360 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x00000000028AF780 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x00000000028B05B0 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x0000000002A02690 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x00007F41AC0BBB70 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x000000000241CCF0 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x000000000241CCD0 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x0000000002717570 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x00000000024198E0 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x000000000241D540 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x0000000002418900 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x000000000241B3D0 does not allocated.
                  jemalloc.c(1249): Free'ed pointer 0x00000000027166E0 does not allocated.
                  

                   

                  All of this calls to free() or delete are from inside XCloseDisplay() call.

                  Also I've implemented an allocation tracing and logged all alloc's and free'es, log file attached to this message.

                  Begin of improper free's are marked with "-- START --". If you check those addresses You'll see that they never allocated, and they doesn't reside in dynamic memory address range.

                   

                  Thanks.

                  Best Regards,

                  Leon.

                  1 of 1 people found this helpful
                • Re: Memory corruption bug in AMD Catalyst 14.12 (fglrx-14.501)
                  leonmaxx

                  Hello again, I'll remember that day when Radeon Software Crimson driver for Linux was released.

                   

                  Latest drivers finally resolved two bugs that I reported here!

                  (First bug: AMD Catalyst - Mouse cursor corruption)

                   

                  Thanks for your hard work!

                   

                  BR, Leon.