1 Reply Latest reply on Jul 9, 2015 4:04 AM by pinform

    ACML 6.1 SGEMM and DGEMM bug in example code

    emartin

      Please move this to AMD Compute Library forum after it's been approved.

       

      I'm running ACML 6.1.0.31 compiled by gfortran for 64 bit Linux (available here: acml-6.1.0.31-gfortran64.tgz )

       

      Running the time_sgemm or time_dgemm example code with either acml or acml_mp causes the process to hang with 0 CPU usage.

      I ran the program under GDB and pressed ctrl-C after it had been hung for a while to see where it was hanging.

       

      Starting program: /home/emartin/.local/lib/acml-6.1/gfortran64/examples/performance/time_sgemm.exe

      [Thread debugging using libthread_db enabled]

      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

      ^C

      Program received signal SIGINT, Interrupt.

      syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

      38    ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.

      (gdb) bt

      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

      #1  0x00007ffff4871371 in __cxa_guard_acquire () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

      #2  0x00007ffff3f12a22 in horus::cl::Control::getDefault() () from /home/emartin/.local/lib/acml-6.1/gfortran64/lib/libacml_bridge.so

      #3  0x00007ffff3f82677 in horus::lua::luaState::loadLuaFile(boost::filesystem::path const&, boost::filesystem::path const&, char const*, lua_State*, int&) ()

         from /home/emartin/.local/lib/acml-6.1/gfortran64/lib/libacml_bridge.so

      #4  0x00007ffff3f122c3 in horus::cl::Control::Control() () from /home/emartin/.local/lib/acml-6.1/gfortran64/lib/libacml_bridge.so

      #5  0x00007ffff3f12b2c in horus::cl::Control::getDefault() () from /home/emartin/.local/lib/acml-6.1/gfortran64/lib/libacml_bridge.so

      #6  0x00007ffff3f31889 in void GemmThreshold<float>(char const*, char const*, int const*, int const*, int const*, float const*, float const*, int const*, int const*, float const*, float const*, int const*, char const*, int*) () from /home/emartin/.local/lib/acml-6.1/gfortran64/lib/libacml_bridge.so

      #7  0x00007ffff3f308f0 in SGEMMTHRESHAPU () from /home/emartin/.local/lib/acml-6.1/gfortran64/lib/libacml_bridge.so

      #8  0x00007ffff59d340f in sgemmthreshapu_dynamic_ () from /home/emartin/.local/lib/acml-6.1/gfortran64/lib/libacml.so

      #9  0x00007ffff66419a1 in sgemmp_ () from /home/emartin/.local/lib/acml-6.1/gfortran64/lib/libacml.so

      #10 0x00007ffff6641feb in sgemm_ () from /home/emartin/.local/lib/acml-6.1/gfortran64/lib/libacml.so

      #11 0x00000000004012d7 in dotime_ ()

      #12 0x00000000004019d6 in MAIN__ ()

       

      Looks like it happens in __cxa_guard_acquire and has something to do with the Lua interop. The behavior was the same for all 4 example codes (sgemm/dgemm and acml/acml_mp).

       

      This error occurred running Ubuntu 15.04 with kernel release 3.19.0-15-generic. My CPU is a Intel Core i5-5200U CPU with integrated Intel HD 5500 graphics. I'm running beignet 1.0.2 for openCL support. I can provide any other output if it helps with debugging, such as /proc/cpuinfo or clinfo (even though it looks like the bug happens outside of openCL, which is why I'm filing against ACML and not clBLAS).