cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

akphysics
Adept I

Segfault in libamdocl64.so under Linux.

Hi, I'm developing a physics simulator that involves running about 8 different kernels that run in a loop.  I use events as some operations can happen in parallel, though, I'm not using an out-of-order command queue yet.  That said..  after queueing hundreds of thousands of iterations..  Memory usage quickly balloons out of control and then I get a segfault.

<code>

Program received signal SIGSEGV, Segmentation fault.

0x00007ffff5d6fcd7 in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

(gdb) bt

#0  0x00007ffff5d6fcd7 in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#1  0x00007ffff5d70b1a in amd::NDRangeKernelCommand::~NDRangeKernelCommand() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#2  0x00007ffff5d7b8f8 in amd::ReferenceCountedObject::release() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#3  0x00007ffff5d6fcdc in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#4  0x00007ffff5d70b1a in amd::NDRangeKernelCommand::~NDRangeKernelCommand() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#5  0x00007ffff5d7b8f8 in amd::ReferenceCountedObject::release() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#6  0x00007ffff5d6fcdc in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#7  0x00007ffff5d70b1a in amd::NDRangeKernelCommand::~NDRangeKernelCommand() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#8  0x00007ffff5d7b8f8 in amd::ReferenceCountedObject::release() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#9  0x00007ffff5d6fcdc in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#10 0x00007ffff5d70b1a in amd::NDRangeKernelCommand::~NDRangeKernelCommand() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#11 0x00007ffff5d7b8f8 in amd::ReferenceCountedObject::release() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

[repeated more than 30000 times, don't know how far it goes.  Aside, I've never seen a backtrace more than 100 deep.]

</code>

I'm using OpenCL 1.1 AMD-APP (898.1)

Any ideas?

0 Likes
3 Replies
akphysics
Adept I

Here is the bottom of the backtrace....

<code>

[SNIP!]

#261882 0x00007ffff5d6fcdc in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#261883 0x00007ffff5d70b1a in amd::NDRangeKernelCommand::~NDRangeKernelCommand() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#261884 0x00007ffff5d7b8f8 in amd::ReferenceCountedObject::release() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#261885 0x00007ffff5d6fcdc in amd::Command::~Command() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#261886 0x00007ffff5d70b1a in amd::NDRangeKernelCommand::~NDRangeKernelCommand() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#261887 0x00007ffff5d7b8f8 in amd::ReferenceCountedObject::release() () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#261888 0x00007ffff5d4154f in clReleaseEvent () from /usr/lib64/OpenCL/vendors/amd/libamdocl64.so

#261889 0x00000000004af5c2 in cl::detail::ReferenceHandler<_cl_event*>::release (event=0x2ac4ad00) at /usr/include/CL/cl.hpp:1086

#261890 0x00000000004b10b1 in cl::detail::Wrapper<_cl_event*>::release (this=0x12d1908) at /usr/include/CL/cl.hpp:1133

#261891 0x00000000004affe8 in cl::detail::Wrapper<_cl_event*>::~Wrapper (this=0x12d1908, __in_chrg=<optimized out>) at /usr/include/CL/cl.hpp:1103

#261892 0x00000000004af650 in cl::Event::~Event (this=0x12d1908, __in_chrg=<optimized out>) at /usr/include/CL/cl.hpp:1538

#261893 0x00000000004b36c8 in std::_Destroy<cl::Event> (__pointer=0x12d1908) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/include/g++-v4/bits/stl_construct.h:89

#261894 0x00000000004b2fdc in std::_Destroy_aux<false>::__destroy<cl::Event*> (__first=0x12d1908, __last=0x12d1910) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/include/g++-v4/bits/stl_construct.h:99

#261895 0x00000000004b2329 in std::_Destroy<cl::Event*> (__first=0x12d1900, __last=0x12d1910) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/include/g++-v4/bits/stl_construct.h:122

#261896 0x00000000004b1435 in std::_Destroy<cl::Event*, cl::Event> (__first=0x12d1900, __last=0x12d1910) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/include/g++-v4/bits/stl_construct.h:148

#261897 0x00000000004b0592 in std::vector<cl::Event, std::allocator<cl::Event> >::~vector (this=0x7fffffffd220, __in_chrg=<optimized out>)

    at /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/include/g++-v4/bits/stl_vector.h:313

#261898 0x00000000004af153 in amethyst::lib::Universe::cl_integrate (this=0x7fffffffd330) at /home/beau/src/amethyst/trunk/lib/universe.cpp:644

#261899 0x0000000000496f3b in amethyst::lib::test_rk4 () at /home/beau/src/amethyst/trunk/lib/test.cpp:227

#261900 0x0000000000493a0b in amethyst::Console_Menu::run (this=0x97b500, command="testrk4") at /home/beau/src/amethyst/trunk/lib/console_menu.cpp:88

#261901 0x0000000000492f94 in amethyst::command_parse (command="testrk4") at /home/beau/src/amethyst/trunk/lib/console.cpp:215

#261902 0x0000000000492a9d in amethyst::start_console () at /home/beau/src/amethyst/trunk/lib/console.cpp:85

#261903 0x0000000000492848 in main (argc=1, argv=0x7fffffffd808) at /home/beau/src/amethyst/trunk/lib/main.cpp:22

</code>

0 Likes
Marix
Adept II

What GPU are you using? I think I am having the same issue on the 7970 (Cypress is fine). Where did you find the debug symbols for the libamdocl64.so?

0 Likes

I'm using a 6970.  Wow, I didn't know that the 7000 series was supported on Linux yet.

As far as debug symbols..  gdb did that for me automagically.  I honestly forget how to pull them manually.  nm -C, reports that there aren't any symbols..

I've been playing with this...   and it appears to be related to the ridiculously long dependency chain.  When I break up the workload by doing a clfinish() after 1000 iterations and starting the event chain over from scratch, I don't get these issues.

I don't think the events at the beginning of the queue get freed until the last event get's free'd...  just speculating..

- Beau

Message was edited by: Beau Bellamy I miswrote the original message, I meant clfinish(), not clflush().