4 Replies Latest reply on Feb 22, 2016 2:19 PM by pszilard

    sprofile segfault

    pszilard

      The CodeXL 1.9 shipped sprofile segfaults while generating profile data at the end of each run. It looks like I do get a (possibly truncated) csv and a certainly truncated atp file.

       

      Command line:

      sprofile --apitrace --tracesummary --occupancy --perfcounter -w $PWD -o prof $PATH_TO_BINARY

       

      Backtrace:

      Program received signal SIGSEGV, Segmentation fault.

      0x0000000000524a57 in std::__detail::_List_node_base::_M_unhook() ()

      (gdb) bt

      #0  0x0000000000524a57 in std::__detail::_List_node_base::_M_unhook() ()

      #1  0x000000000047e888 in CLAtpFilePart::MergeTimestamp(std::string

      const&, std::map<std::string, std::list<GPUTimestamp,

      std::allocator<GPUTimestamp> >, std::less<std::string>,

      std::allocator<std::pair<std::string const, std::list<GPUTimestamp,

      std::allocator<GPUTimestamp> > > > >&, std::vector<CLAPIInfo*,

      std::allocator<CLAPIInfo*> >&) ()

      #2  0x0000000000480097 in

      CLAtpFilePart::UpdateTmpTimestampFiles(std::string const&, std::string

      const&) ()

      #3  0x0000000000480ae6 in

      CLAtpFilePart::WriteContentSection(std::basic_ofstream<char,

      std::char_traits<char> >&, std::string const&, std::string const&) ()

      #4  0x0000000000497061 in AtpFileWriter::SaveToAtpFile() ()

      #5  0x0000000000416b7a in MergeTraceFile(int) [clone .isra.160] ()

      #6  0x0000000000417359 in MergeFragFiles(int) ()

      #7  0x000000000041a99f in main ()

       

      Is there are better place (e.g. a proper bug-tracker) to report bugs?

        • Re: sprofile segfault
          chesik

          Do you see this with any application (like perhaps one of the APP SDK samples) or is it specific to your particular application?

           

          Does this only happen when you include both --apitrace and --perfcounter on the sprofile command line?  If you omit one, of those switches does it make a difference?

           

          Chris

            • Re: sprofile segfault
              pszilard

              > Do you see this with any application (like perhaps one of the APP SDK samples) or is it specific to your particular application?

              Not sure because I started to get a strange error:

              sprofile --perfcounter -w /nethome/pszilard/tools/amd-appsdk_2.9_samples/opencl/cl/BufferBandwidth/bin/x86_64/Release -o prof /nethome/pszilard/tools/amd-appsdk_2.9_samples/opencl/cl/BufferBandwidth/bin/x86_64/Release/BufferBandwidth

              AMD CodeXL GPU Profiler V3.1.10132 is Enabled

              *** Error in `/nethome/pszilard/tools/amd-appsdk_2.9_samples/opencl/cl/BufferBandwidth/bin/x86_64/Release/BufferBandwidth': free(): invalid pointer: 0x00007fe83fe777b8 ***

               

              > Does this only happen when you include both --apitrace and --perfcounter on the sprofile command line?  If you omit one, of those switches does it make a difference?

              I think so, I was able to profile with only "--occupancy --perfcounter" earlier today.

                • Re: sprofile segfault
                  pszilard

                  Actually, it looks like no matter what program I try to profile and what sprofile command line argument combination I use, if I pass "-p/--perfcounter" I now always get the above mentioned error, e.g with the PrefixSum SDK example:

                  $ sprofile -p ./PrefixSum

                  AMD CodeXL GPU Profiler V3.1.10132 is Enabled

                  *** Error in `/nethome/pszilard/tools/amd-appsdk_2.9_samples/opencl/cl/PrefixSum/bin/x86_64/Release/./PrefixSum': free(): invalid pointer: 0x00007f4cae8cf7b8 ***

                  Failed to generate profile result /nethome/pszilard/Session1.csv.

                  The pointers it complains about seem to be quite close to each other, e.g. with another application I got 0x00007fb407f487b8.

                   

                  This is really weird as I have been able to generate profile data on the same machine, same driver, same hardware about a week ago.

                    • Re: sprofile segfault
                      pszilard

                      I figured this out and it is not pretty. Connecting to the remote headless Linux server (Ubuntu 14.04, no X running) without X forwarding, causes sprofile to always crash with -p, but when connecting with X forwarding, it seems to run fine.

                      Such bugs are preventable and should be prevented by beta testing IMO and it is very concerning that they are not. Is there a beta program for CodeXL?

                       

                      Here's the backtrace, just in case...

                       

                      (gdb) r

                      Starting program: /tmp/PrefixSum

                      [Thread debugging using libthread_db enabled]

                      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

                      AMD CodeXL GPU Profiler V3.1.10132 is Enabled

                      *** Error in `/tmp/PrefixSum': free(): invalid pointer: 0x00007ffff73ac7b8 ***

                       

                       

                      Program received signal SIGABRT, Aborted.

                      0x00007ffff7024cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56

                      56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.

                      (gdb) bt

                      #0  0x00007ffff7024cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56

                      #1  0x00007ffff70280d8 in __GI_abort () at abort.c:89

                      #2  0x00007ffff7061394 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7ffff716fb28 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175

                      #3  0x00007ffff706d66e in malloc_printerr (ptr=<optimized out>, str=0x7ffff716bc19 "free(): invalid pointer", action=1) at malloc.c:4996

                      #4  _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840

                      #5  0x00007ffff7e3880d in ?? () from /usr/lib/libatiadlxx.so

                      #6  0x00007ffff7e48605 in ADL2_Main_Control_Destroy () from /usr/lib/libatiadlxx.so

                      #7  0x00007fffedefe1c7 in AMDTADLUtils::Unload() () from /opt/tcbsys/amd/codexl/1.9.10132/x86_64/libCLProfileAgent.so

                      #8  0x00007fffedefe568 in AMDTADLUtils::LoadAndInit() () from /opt/tcbsys/amd/codexl/1.9.10132/x86_64/libCLProfileAgent.so

                      #9  0x00007fffedefd4d7 in AMDTADLUtils::GetAsicInfoList(std::vector<ADLUtil_ASICInfo, std::allocator<ADLUtil_ASICInfo> >&) () from /opt/tcbsys/amd/codexl/1.9.10132/x86_64/libCLProfileAgent.so

                      #10 0x00007fffedeabce5 in CLGPAProfiler::Init(Parameters const&, std::string&) () from /opt/tcbsys/amd/codexl/1.9.10132/x86_64/libCLProfileAgent.so

                      #11 0x00007fffedea1ef7 in InitProfiler() () from /opt/tcbsys/amd/codexl/1.9.10132/x86_64/libCLProfileAgent.so

                      #12 0x00007fffedea1455 in clAgent_OnLoad () from /opt/tcbsys/amd/codexl/1.9.10132/x86_64/libCLProfileAgent.so

                      #13 0x00007ffff32c819d in ?? () from /usr/lib/libamdocl64.so

                      #14 0x00007ffff32c8f77 in ?? () from /usr/lib/libamdocl64.so

                      #15 0x00007ffff32d8305 in ?? () from /usr/lib/libamdocl64.so

                      #16 0x00007ffff32a8e53 in clIcdGetPlatformIDsKHR () from /usr/lib/libamdocl64.so

                      #17 0x00007ffff7bd55b2 in ?? () from /opt/tcbsys/amd/appsdk/2.9/lib/x86_64/libOpenCL.so.1

                      #18 0x00007ffff7bd7986 in ?? () from /opt/tcbsys/amd/appsdk/2.9/lib/x86_64/libOpenCL.so.1

                      #19 0x00007ffff6ddda90 in pthread_once () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103

                      #20 0x00007ffff7bd7747 in ?? () from /opt/tcbsys/amd/appsdk/2.9/lib/x86_64/libOpenCL.so.1

                      #21 0x00007ffff7bd57a5 in ?? () from /opt/tcbsys/amd/appsdk/2.9/lib/x86_64/libOpenCL.so.1

                      #22 0x00007ffff7bd6f20 in clGetPlatformIDs () from /opt/tcbsys/amd/appsdk/2.9/lib/x86_64/libOpenCL.so.1

                      #23 0x0000000000412622 in appsdk::CLCommandArgs::validatePlatformAndDeviceOptions (this=0x622010) at /opt/tcbsys/amd/appsdk/2.9/include/SDKUtil/CLUtil.hpp:1121

                      #24 0x000000000041235e in appsdk::CLCommandArgs::parseCommandLine (this=0x622010, argc=1, argv=0x7fffffffe4a8) at /opt/tcbsys/amd/appsdk/2.9/include/SDKUtil/CLUtil.hpp:1103

                      #25 0x000000000040f059 in main (argc=1, argv=0x7fffffffe4a8) at /nethome/pszilard/tools/amd-appsdk_2.9_samples/opencl/cl/PrefixSum/PrefixSum.cpp:621