3 Replies Latest reply on May 10, 2016 6:49 AM by dipak

    opencl: SIGSEGV with native kernels.

    ribalda

      Hi

       

      We have an OpenCL application that makes use of Native Kernels. This apps was SIGSEGV from time to time. After a lot of debugging we have narrowed down the issue to the Native Kernel scheduling on the AMD Inplementation of OpenCL.

       

      We are using fgrlx version 15.12 (linux 64 bits) and the AMD appsdk v 3.0. Clinfo output is attached to this discussion.

       

      We have created an small testcase that triggers the error.  On that testcase we launch a native kernel that simply waits for a flag to return.  A correct exection will have the following output

      (S, start, L launch E end F finish). It is easier to trigger the error if the app is launched like ./native_kernel_test 2>/tmp/out_log

       

       

      L 0

      S 0

      E 0

      F 0

      L 1

      S 1

      E 1

      F 1

      L 2

      S 2

      E 2

      F 2

      L 0

      S 0

      E 0

      F 0

       

      and so on.

       

      But after  a minute or so I get the folling output

      L 1

      S 1

      E 1

      F 1

      L 2

      S 2

      E 2

      S 8935824

      F 2

      L 0

      Segmentation fault

      or

       

      L 2

      S 2

      E 2

      F 2

      L 0

      S 9855280

      F 0

      L 1

      Segmentation faul

        • Re: opencl: SIGSEGV with native kernels.
          nibal

          Hi,

           

          Do you have a valgrind output? Is this related to Memory corruption in latest crimson driver 15.302?

           

          BR,

          Nikos

            • Re: opencl: SIGSEGV with native kernels.
              ribalda

              This is the valgrind output

               

              root@qt5022:~# valgrind ./native_kernel_test

              ==702== Memcheck, a memory error detector

              ==702== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.

              ==702== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info

              ==702== Command: ./native_kernel_test

              ==702==

              ==702== Syscall param writev(vector[...]) points to uninitialised byte(s)

              ==702==    at 0x51149C0: __writev_nocancel (syscall-template.S:84)

              ==702==    by 0xA840B43: ??? (in /usr/lib/libxcb.so.1.1.0)

              ==702==    by 0xA840F1C: ??? (in /usr/lib/libxcb.so.1.1.0)

              ==702==    by 0xA840F9C: xcb_writev (in /usr/lib/libxcb.so.1.1.0)

              ==702==    by 0xA53D1AD: _XSend (in /usr/lib/libX11.so.6.3.0)

              ==702==    by 0xA53D66D: _XReply (in /usr/lib/libX11.so.6.3.0)

              ==702==    by 0x6B89B16: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B8A320: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x688663B: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B8B701: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B86B66: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B57AC6: ??? (in /usr/lib/libamdocl64.so)

              ==702==  Address 0x5de1170 is 32 bytes inside a block of size 16,384 alloc'd

              ==702==    at 0x4C29BE5: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

              ==702==    by 0xA52DCC1: XOpenDisplay (in /usr/lib/libX11.so.6.3.0)

              ==702==    by 0x67ED077: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5D99D: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B9613: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B9845: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6777DCF: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6791CE6: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67629E2: clIcdGetPlatformIDsKHR (in /usr/lib/libamdocl64.so)

              ==702==    by 0x4E316ED: ??? (in /usr/lib/libOpenCL.so.1)

              ==702==    by 0x4E33536: ??? (in /usr/lib/libOpenCL.so.1)

              ==702==    by 0x5AED210: __pthread_once_slow (pthread_once.c:116)

              ==702==

              ==702== Conditional jump or move depends on uninitialised value(s)

              ==702==    at 0x687FD82: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B86C90: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B57AC6: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B57B1B: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5CFCD: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B4674F: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6856F25: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D08: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D69: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x68939B2: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B755A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B96A7: ??? (in /usr/lib/libamdocl64.so)

              ==702==

              ==702== Conditional jump or move depends on uninitialised value(s)

              ==702==    at 0x6B6EC18: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B6E377: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B6E5F0: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B57EA5: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5CFCD: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B4674F: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6856F25: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D08: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D69: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x68939B2: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B755A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B96A7: ??? (in /usr/lib/libamdocl64.so)

              ==702==

              ==702== Syscall param ioctl(generic) points to uninitialised byte(s)

              ==702==    at 0x5114927: ioctl (syscall-template.S:84)

              ==702==    by 0xA1D2502: uki_firegl_AllocMutex (in /usr/lib/libatiuki.so.1.0)

              ==702==    by 0x6B8BBA2: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B86EC0: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5D1E5: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B4674F: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6856F25: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D08: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D69: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x68939B2: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B755A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B96A7: ??? (in /usr/lib/libamdocl64.so)

              ==702==  Address 0xfff00036c is on thread 1's stack

              ==702==  in frame #1, created by uki_firegl_AllocMutex (???)

              ==702==

              ==702== Conditional jump or move depends on uninitialised value(s)

              ==702==    at 0x6B8B908: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B86A2C: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5BC83: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6806DF5: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x685653A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D08: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D69: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x68939B2: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B755A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B96A7: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B9845: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6777DCF: ??? (in /usr/lib/libamdocl64.so)

              ==702==

              ==702== Conditional jump or move depends on uninitialised value(s)

              ==702==    at 0x6B8B9F0: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B86A2C: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5BC83: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6806DF5: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x685653A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D08: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D69: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x68939B2: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B755A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B96A7: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B9845: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6777DCF: ??? (in /usr/lib/libamdocl64.so)

              ==702==

              ==702== Syscall param ioctl(generic) points to uninitialised byte(s)

              ==702==    at 0x5114927: ioctl (syscall-template.S:84)

              ==702==    by 0xA1D29AA: uki_firegl_LockHardware (in /usr/lib/libatiuki.so.1.0)

              ==702==    by 0x6B8BA6B: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B86A2C: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5BC83: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6806DF5: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x685653A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D08: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D69: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x68939B2: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B755A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B96A7: ??? (in /usr/lib/libamdocl64.so)

              ==702==  Address 0xfff0000a8 is on thread 1's stack

              ==702==  in frame #1, created by uki_firegl_LockHardware (???)

              ==702==

              ==702== Conditional jump or move depends on uninitialised value(s)

              ==702==    at 0x6B8BABE: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B86A53: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5BC83: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6806DF5: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x685653A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D08: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D69: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x68939B2: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B755A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B96A7: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B9845: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6777DCF: ??? (in /usr/lib/libamdocl64.so)

              ==702==

              ==702== Conditional jump or move depends on uninitialised value(s)

              ==702==    at 0x6B8BB21: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B86A53: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5BC83: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6806DF5: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x685653A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D08: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6892D69: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x68939B2: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B755A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B96A7: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B9845: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6777DCF: ??? (in /usr/lib/libamdocl64.so)

              ==702==

              ==702== Conditional jump or move depends on uninitialised value(s)

              ==702==    at 0x6B8B87E: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B8B8BC: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B86985: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B65532: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B655D4: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5A82F: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B4DA8D: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5330A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6803422: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6862934: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x68928C3: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6893A22: ??? (in /usr/lib/libamdocl64.so)

              ==702==

              ==702== Syscall param ioctl(generic) points to uninitialised byte(s)

              ==702==    at 0x5114927: ioctl (syscall-template.S:84)

              ==702==    by 0xA1D2548: uki_firegl_FreeMutex (in /usr/lib/libatiuki.so.1.0)

              ==702==    by 0x6B8B89E: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B8B8BC: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B86985: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B65532: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B655D4: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5A82F: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B4DA8D: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6B5330A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6803422: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6862934: ??? (in /usr/lib/libamdocl64.so)

              ==702==  Address 0xfff000484 is on thread 1's stack

              ==702==  in frame #1, created by uki_firegl_FreeMutex (???)

              ==702==

              ==702== Syscall param write(buf) points to uninitialised byte(s)

              ==702==    at 0x510F340: __write_nocancel (syscall-template.S:84)

              ==702==    by 0x7122B31: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x7123945: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x711EE22: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6778D10: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67D411F: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x677CEFE: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x678CFEF: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6779D1A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B793B: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B96A7: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B9845: ??? (in /usr/lib/libamdocl64.so)

              ==702==  Address 0xf62225e is 15,742 bytes inside a block of size 93,692 alloc'd

              ==702==    at 0x4C28DFF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

              ==702==    by 0x7122991: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x7123945: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x711EE22: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6778D10: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67D411F: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x677CEFE: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x678CFEF: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6779D1A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B793B: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B96A7: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67B9845: ??? (in /usr/lib/libamdocl64.so)

              ==702==

              ==702== Thread 3:

              ==702== Invalid read of size 8

              ==702==    at 0x67984A4: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67985A4: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x672601E: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6793B4B: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x5AE63AD: start_thread (pthread_create.c:337)

              ==702==  Address 0xef15230 is 224 bytes inside a block of size 328 free'd

              ==702==    at 0x4C295BB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

              ==702==    by 0x6791D7A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6782D10: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x679859C: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x672601E: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6793B4B: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x5AE63AD: start_thread (pthread_create.c:337)

              ==702==

              ==702== Invalid read of size 4

              ==702==    at 0x67984AB: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67985A4: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x672601E: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6793B4B: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x5AE63AD: start_thread (pthread_create.c:337)

              ==702==  Address 0xef1522c is 220 bytes inside a block of size 328 free'd

              ==702==    at 0x4C295BB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

              ==702==    by 0x6791D7A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6782D10: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x679859C: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x672601E: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6793B4B: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x5AE63AD: start_thread (pthread_create.c:337)

              ==702==

              ==702== Invalid read of size 4

              ==702==    at 0x6782B99: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x67985A4: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x672601E: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6793B4B: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x5AE63AD: start_thread (pthread_create.c:337)

              ==702==  Address 0xef151e0 is 144 bytes inside a block of size 328 free'd

              ==702==    at 0x4C295BB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

              ==702==    by 0x6791D7A: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6782D10: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x679859C: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x672601E: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x6793B4B: ??? (in /usr/lib/libamdocl64.so)

              ==702==    by 0x5AE63AD: start_thread (pthread_create.c:337)

              ==702==

              ^C==702==

              ==702== HEAP SUMMARY:

              ==702==     in use at exit: 5,108,674 bytes in 2,814 blocks

              ==702==   total heap usage: 315,656 allocs, 312,842 frees, 103,356,416 bytes allocated

              ==702==

              ==702== LEAK SUMMARY:

              ==702==    definitely lost: 1,708 bytes in 18 blocks

              ==702==    indirectly lost: 120 bytes in 3 blocks

              ==702==      possibly lost: 581,792 bytes in 200 blocks

              ==702==    still reachable: 4,525,054 bytes in 2,593 blocks

              ==702==         suppressed: 0 bytes in 0 blocks

              ==702== Rerun with --leak-check=full to see details of leaked memory

              ==702==

              ==702== For counts of detected and suppressed errors, rerun with: -v

              ==702== Use --track-origins=yes to see where uninitialised values come from

              ==702== ERROR SUMMARY: 165 errors from 15 contexts (suppressed: 0 from 0)

              Killed

                • Re: opencl: SIGSEGV with native kernels.
                  dipak

                  Hi,

                  My apologies for this delayed reply.

                  I'm able to reproduce the segfault on my Linux setup. Though, it seems working fine on Windows. I'll check further and take the actions needed.

                  BTW, if possible, could you please check it once on Windows and share your findings?

                   

                  Regards,