1 Reply Latest reply on Mar 3, 2018 8:43 AM by jpsollie

    opencl segmentation fault in callback event while launching clEnqueueNDRangeKernel()

    jpsollie

      hello everyone,

      the history of my program is somewhat strange, so I hope you guys will not be mad why I am not posting the full code:

      for my marriage, I got a PDF which was 256-bit encoded, and the password is SHA-256 encrypted. the job was to decrypt it. the password is >7chars and contains basic special characters ...

      I then decided to build a powerful opencl brute-force opencl system:

      host (iterates over max_strlen - 6 characters)-> client(iterates over 2 characters) -> OpenCL device (iterates over 4 characters, one on the host, 3 on the device)

      every -> means that multiple connections are possible.  The -> using host and client is built using TCP/IPv4

      the server part works as designed.

      So does the OpenCL code.

      the client code does what it should do.  I already have tried to decode the string "aaaaaa" and it found a solution on openCL device 0 and reported it to the server.

      now, the problem:

      on this server which contains the following hardware:

       

      Number of platforms:                         2
        Platform Profile:                          FULL_PROFILE
        Platform Version:                          OpenCL 2.1 AMD-APP (2527.3)
        Platform Name:                             AMD Accelerated Parallel Processing
        Platform Vendor:                           Advanced Micro Devices, Inc.
        Platform Extensions:                       cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
        Platform Profile:                          FULL_PROFILE
        Platform Version:                          OpenCL 1.2 pocl 1.0, LLVM 5.0.1
        Platform Name:                             Portable Computing Language
        Platform Vendor:                           The pocl project
        Platform Extensions:                       cl_khr_icd

       

       

        Platform Name:                             AMD Accelerated Parallel Processing
      Number of devices:                           3
        Device Type:                               CL_DEVICE_TYPE_GPU
        Vendor ID:                                 1002h
        Board name:                                AMD Radeon Graphics
        Device Topology:                           PCI[ B#4, D#0, F#0 ]
        Max compute units:                         36
        Max work items dimensions:                 3
      Max work items[0]:                       1024
      Max work items[1]:                       1024
      Max work items[2]:                       1024
        Max work group size:                       256
        Preferred vector width char:               4
        Preferred vector width short:              2
        Preferred vector width int:                1
        Preferred vector width long:               1
        Preferred vector width float:              1
        Preferred vector width double:             1
        Native vector width char:                  4
        Native vector width short:                 2
        Native vector width int:                   1
        Native vector width long:                  1
        Native vector width float:                 1
        Native vector width double:                1
        Max clock frequency:                       300Mhz
        Address bits:                              64
        Max memory allocation:                     3422315315
        Image support:                             Yes
        Max number of images read arguments:       128
        Max number of images write arguments:      8
        Max image 2D width:                        16384
        Max image 2D height:                       16384
        Max image 3D width:                        2048
        Max image 3D height:                       2048
        Max image 3D depth:                        2048
        Max samplers within kernel:                16
        Max size of kernel argument:               1024
        Alignment (bits) of base address:          2048
        Minimum alignment (bytes) for any datatype:128

       

       

        Device Type:                               CL_DEVICE_TYPE_GPU
        Vendor ID:                                 1002h
        Board name:                                AMD Radeon (TM) R9 Fury Series
        Device Topology:                           PCI[ B#66, D#0, F#0 ]
        Max compute units:                         64
        Max work items dimensions:                 3
      Max work items[0]:                       1024
      Max work items[1]:                       1024
      Max work items[2]:                       1024
        Max work group size:                       256
        Preferred vector width char:               4
        Preferred vector width short:              2
        Preferred vector width int:                1
        Preferred vector width long:               1
        Preferred vector width float:              1
        Preferred vector width double:             1
        Native vector width char:                  4
        Native vector width short:                 2
        Native vector width int:                   1
        Native vector width long:                  1
        Native vector width float:                 1
        Native vector width double:                1
        Max clock frequency:                       300Mhz
        Address bits:                              64
        Max memory allocation:                     3422315315
        Image support:                             Yes
        Max number of images read arguments:       128
        Max number of images write arguments:      8
        Max image 2D width:                        16384
        Max image 2D height:                       16384
        Max image 3D width:                        2048
        Max image 3D height:                       2048
        Max image 3D depth:                        2048
        Max samplers within kernel:                16
        Max size of kernel argument:               1024
        Alignment (bits) of base address:          2048
        Minimum alignment (bytes) for any datatype:128

       

       

       

        Device Type:                               CL_DEVICE_TYPE_GPU
        Vendor ID:                                 1002h
        Board name:                                AMD Radeon R9 200 Series
        Device Topology:                           PCI[ B#65, D#0, F#0 ]
        Max compute units:                         28
        Max work items dimensions:                 3
      Max work items[0]:                       1024
      Max work items[1]:                       1024
      Max work items[2]:                       1024
        Max work group size:                       256
        Preferred vector width char:               4
        Preferred vector width short:              2
        Preferred vector width int:                1
        Preferred vector width long:               1
        Preferred vector width float:              1
        Preferred vector width double:             1
        Native vector width char:                  4
        Native vector width short:                 2
        Native vector width int:                   1
        Native vector width long:                  1
        Native vector width float:                 1
        Native vector width double:                1
        Max clock frequency:                       300Mhz
        Address bits:                              64
        Max memory allocation:                     1596954214
        Image support:                             Yes
        Max number of images read arguments:       128
        Max number of images write arguments:      8
        Max image 2D width:                        16384
        Max image 2D height:                       16384
        Max image 3D width:                        2048
        Max image 3D height:                       2048
        Max image 3D depth:                        2048
        Max samplers within kernel:                16
        Max size of kernel argument:               1024
        Alignment (bits) of base address:          2048
        Minimum alignment (bytes) for any datatype:128

       

       

        Platform Name:                             Portable Computing Language
      Number of devices:                           1
        Device Type:                               CL_DEVICE_TYPE_CPU
        Vendor ID:                                 1022h
        Max compute units:                         32
        Max work items dimensions:                 3
      Max work items[0]:                       4096
      Max work items[1]:                       4096
      Max work items[2]:                       4096
        Max work group size:                       4096
        Preferred vector width char:               16
        Preferred vector width short:              16
        Preferred vector width int:                8
        Preferred vector width long:               4
        Preferred vector width float:              8
        Preferred vector width double:             4
        Native vector width char:                  16
        Native vector width short:                 16
        Native vector width int:                   8
        Native vector width long:                  4
        Native vector width float:                 8
        Native vector width double:                4
        Max clock frequency:                       2300Mhz
        Address bits:                              64
        Max memory allocation:                     34359738368
        Image support:                             Yes
        Max number of images read arguments:       128
        Max number of images write arguments:      128
        Max image 2D width:                        32768
        Max image 2D height:                       32768
        Max image 3D width:                        2048
        Max image 3D height:                       2048
        Max image 3D depth:                        2048
        Max samplers within kernel:                16
        Max size of kernel argument:               1024
        Alignment (bits) of base address:          1024
        Minimum alignment (bytes) for any datatype:128

      (output of clinfo, somewhat stripped)

      I have a strange problem: Devices based on the GPU cause a segmentation fault while executing

      This does not happen on a portable where Intel OpenCL SDK is installed, together with the NVidia drivers.

      I do not say it's amd's fault, as I simply don't know.  I suspect a memory leak:

      In the 'start' function of the opencl program:

      [code]

      Breakpoint 1, start_with_events (in_devs=0x60c4b0, amount=4) at src/openclhost.c:361

      361             err = clEnqueueNDRangeKernel(in_devs[i].command_queue, in_devs[i].kernels[in_devs[i].hosting_iterator->current_char % 4], 3, NULL, globalworkids, localworkids, 0, NULL, &(in_devs[i].event));

      (gdb) next

      362             if(err != CL_SUCCESS) {

      (gdb) next

      366             err = clSetEventCallback(in_devs[i].event, CL_COMPLETE, nextevent, pass);

      (gdb) next

      367                     if(err != CL_SUCCESS) {

      (gdb) print in_devs[i].event

      $1 = (cl_event) 0x12f4440

      [/code]

       

      in the start new NDRangeKernel callback function of the program:

      [code]

      (gdb) continue

      Continuing.

       

      Program received signal SIGSEGV, Segmentation fault.

      [Switching to Thread 0x7ffbcc45f700 (LWP 927)]

      0x00007ffff766d18b in _int_malloc () from /lib64/libc.so.6

      (gdb) print in_devs[i].event

      No symbol "in_devs" in current context.

      (gdb) up

      #1  0x00007ffff766ecd8 in malloc () from /lib64/libc.so.6

      (gdb) up

      #2  0x00007ffff57b1e3d in ?? () from /opt/amdgpu/lib64/libamdocl64.so

      (gdb) up

      #3  0x00007ffff343c987 in clEnqueueNDRangeKernel () from /opt/amdgpu/lib64/libamdocl64.so

      (gdb) up

      #4  0x00007ffff7bc4f22 in clEnqueueNDRangeKernel () from /usr/lib64/libOpenCL.so.1

      (gdb) up

      #5  0x0000000000402404 in launchKernel (event=0x7fbed8310bf0, event_command_exec_status=0, user_data=0x1849be0) at src/openclhost.c:107

      107             err = clEnqueueNDRangeKernel(device->command_queue, device->kernels[kernelnr], 3, NULL, gwids, lwids, 1, &(process->multiple_use), &(device->event));

      (gdb) print in_devs[i].event

      No symbol "in_devs" in current context.

      (gdb) print device->event

      $2 = (cl_event) 0x0

      (gdb) print process->deviceid

      $3 = 0

      (gdb) bt

      #0  0x00007ffff766d18b in _int_malloc () from /lib64/libc.so.6

      #1  0x00007ffff766ecd8 in malloc () from /lib64/libc.so.6

      #2  0x00007ffff57b1e3d in ?? () from /opt/amdgpu/lib64/libamdocl64.so

      #3  0x00007ffff343c987 in clEnqueueNDRangeKernel () from /opt/amdgpu/lib64/libamdocl64.so

      #4  0x00007ffff7bc4f22 in clEnqueueNDRangeKernel () from /usr/lib64/libOpenCL.so.1

      #5  0x0000000000402404 in launchKernel (event=0x7fbed8310bf0, event_command_exec_status=0, user_data=0x1849be0) at src/openclhost.c:107

      #6  0x00007ffff3465619 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

      #7  0x00007ffff34656c2 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

      #8  0x00007ffff34b64f8 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

      #9  0x00007ffff34b6756 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

      #10 0x00007ffff34682e6 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

      #11 0x00007ffff346898d in ?? () from /opt/amdgpu/lib64/libamdocl64.so

      #12 0x00007ffff33fd67f in ?? () from /opt/amdgpu/lib64/libamdocl64.so

      #13 0x00007ffff347998c in ?? () from /opt/amdgpu/lib64/libamdocl64.so

      #14 0x00007ffff79a28c7 in start_thread () from /lib64/libpthread.so.0

      #15 0x00007ffff76e348f in clone () from /lib64/libc.so.6

      (gdb)

      [/code]

       

      any ideas where an opencl event is set to 0?

      I also tried valgrind, but it crashes with an illegal instruction.  No idea how to handle this...

      any ideas?