3 Replies Latest reply on Apr 16, 2012 1:23 PM by akphysics

    Segfault in libamdocl64.so under Linux.

    akphysics

      Hi, I'm developing a physics simulator that involves running about 8 different kernels that run in a loop.  I use events as some operations can happen in parallel, though, I'm not using an out-of-order command queue yet.  That said..  after queueing hundreds of thousands of iterations..  Memory usage quickly balloons out of control and then I get a segfault.

       

      <code>

      Program received signal SIGSEGV, Segmentation fault.

      0x00007ffff5d6fcd7 in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

       

      (gdb) bt

      #0  0x00007ffff5d6fcd7 in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

      #1  0x00007ffff5d70b1a in amd::NDRangeKernelCommand::~NDRangeKernelCommand() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

      #2  0x00007ffff5d7b8f8 in amd::ReferenceCountedObject::release() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

      #3  0x00007ffff5d6fcdc in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

      #4  0x00007ffff5d70b1a in amd::NDRangeKernelCommand::~NDRangeKernelCommand() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

      #5  0x00007ffff5d7b8f8 in amd::ReferenceCountedObject::release() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

      #6  0x00007ffff5d6fcdc in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

      #7  0x00007ffff5d70b1a in amd::NDRangeKernelCommand::~NDRangeKernelCommand() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

      #8  0x00007ffff5d7b8f8 in amd::ReferenceCountedObject::release() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

      #9  0x00007ffff5d6fcdc in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

      #10 0x00007ffff5d70b1a in amd::NDRangeKernelCommand::~NDRangeKernelCommand() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

      #11 0x00007ffff5d7b8f8 in amd::ReferenceCountedObject::release() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

      [repeated more than 30000 times, don't know how far it goes.  Aside, I've never seen a backtrace more than 100 deep.]

      </code>

       

      I'm using OpenCL 1.1 AMD-APP (898.1)

       

      Any ideas?

        • Re: Segfault in libamdocl64.so under Linux.
          akphysics

          Here is the bottom of the backtrace....

           

          <code>

          [SNIP!]

          #261882 0x00007ffff5d6fcdc in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

          #261883 0x00007ffff5d70b1a in amd::NDRangeKernelCommand::~NDRangeKernelCommand() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

          #261884 0x00007ffff5d7b8f8 in amd::ReferenceCountedObject::release() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

          #261885 0x00007ffff5d6fcdc in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

          #261886 0x00007ffff5d70b1a in amd::NDRangeKernelCommand::~NDRangeKernelCommand() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

          #261887 0x00007ffff5d7b8f8 in amd::ReferenceCountedObject::release() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

          #261888 0x00007ffff5d4154f in clReleaseEvent () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

          #261889 0x00000000004af5c2 in cl::detail::ReferenceHandler<_cl_event*>::release (event=0x2ac4ad00) at /usr/include/CL/cl.hpp:1086

          #261890 0x00000000004b10b1 in cl::detail::Wrapper<_cl_event*>::release (this=0x12d1908) at /usr/include/CL/cl.hpp:1133

          #261891 0x00000000004affe8 in cl::detail::Wrapper<_cl_event*>::~Wrapper (this=0x12d1908, __in_chrg=<optimized out>) at /usr/include/CL/cl.hpp:1103

          #261892 0x00000000004af650 in cl::Event::~Event (this=0x12d1908, __in_chrg=<optimized out>) at /usr/include/CL/cl.hpp:1538

          #261893 0x00000000004b36c8 in std::_Destroy<cl::Event> (__pointer=0x12d1908) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/include/g++-v4/bits/stl_construct.h:89

          #261894 0x00000000004b2fdc in std::_Destroy_aux<false>::__destroy<cl::Event*> (__first=0x12d1908, __last=0x12d1910) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/include/g++-v4/bits/stl_construct.h:99

          #261895 0x00000000004b2329 in std::_Destroy<cl::Event*> (__first=0x12d1900, __last=0x12d1910) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/include/g++-v4/bits/stl_construct.h:122

          #261896 0x00000000004b1435 in std::_Destroy<cl::Event*, cl::Event> (__first=0x12d1900, __last=0x12d1910) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/include/g++-v4/bits/stl_construct.h:148

          #261897 0x00000000004b0592 in std::vector<cl::Event, std::allocator<cl::Event> >::~vector (this=0x7fffffffd220, __in_chrg=<optimized out>)

              at /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/include/g++-v4/bits/stl_vector.h:313

          #261898 0x00000000004af153 in amethyst::lib::Universe::cl_integrate (this=0x7fffffffd330) at /home/beau/src/amethyst/trunk/lib/universe.cpp:644

          #261899 0x0000000000496f3b in amethyst::lib::test_rk4 () at /home/beau/src/amethyst/trunk/lib/test.cpp:227

          #261900 0x0000000000493a0b in amethyst::Console_Menu::run (this=0x97b500, command="testrk4") at /home/beau/src/amethyst/trunk/lib/console_menu.cpp:88

          #261901 0x0000000000492f94 in amethyst::command_parse (command="testrk4") at /home/beau/src/amethyst/trunk/lib/console.cpp:215

          #261902 0x0000000000492a9d in amethyst::start_console () at /home/beau/src/amethyst/trunk/lib/console.cpp:85

          #261903 0x0000000000492848 in main (argc=1, argv=0x7fffffffd808) at /home/beau/src/amethyst/trunk/lib/main.cpp:22

          </code>

          • Re: Segfault in libamdocl64.so under Linux.
            Marix

            What GPU are you using? I think I am having the same issue on the 7970 (Cypress is fine). Where did you find the debug symbols for the libamdocl64.so?

              • Re: Segfault in libamdocl64.so under Linux.
                akphysics

                I'm using a 6970.  Wow, I didn't know that the 7000 series was supported on Linux yet.

                 

                As far as debug symbols..  gdb did that for me automagically.  I honestly forget how to pull them manually.  nm -C, reports that there aren't any symbols..

                 

                I've been playing with this...   and it appears to be related to the ridiculously long dependency chain.  When I break up the workload by doing a clfinish() after 1000 iterations and starting the event chain over from scratch, I don't get these issues.

                 

                I don't think the events at the beginning of the queue get freed until the last event get's free'd...  just speculating..

                 

                - Beau

                 

                Message was edited by: Beau Bellamy I miswrote the original message, I meant clfinish(), not clflush().

                1 of 1 people found this helpful