AnsweredAssumed Answered

opencl segmentation fault in callback event while launching clEnqueueNDRangeKernel()

Question asked by jpsollie on Mar 3, 2018
Latest reply on Mar 3, 2018 by jpsollie

hello everyone,

the history of my program is somewhat strange, so I hope you guys will not be mad why I am not posting the full code:

for my marriage, I got a PDF which was 256-bit encoded, and the password is SHA-256 encrypted. the job was to decrypt it. the password is >7chars and contains basic special characters ...

I then decided to build a powerful opencl brute-force opencl system:

host (iterates over max_strlen - 6 characters)-> client(iterates over 2 characters) -> OpenCL device (iterates over 4 characters, one on the host, 3 on the device)

every -> means that multiple connections are possible.  The -> using host and client is built using TCP/IPv4

the server part works as designed.

So does the OpenCL code.

the client code does what it should do.  I already have tried to decode the string "aaaaaa" and it found a solution on openCL device 0 and reported it to the server.

now, the problem:

on this server which contains the following hardware:

 

Number of platforms:                         2
  Platform Profile:                          FULL_PROFILE
  Platform Version:                          OpenCL 2.1 AMD-APP (2527.3)
  Platform Name:                             AMD Accelerated Parallel Processing
  Platform Vendor:                           Advanced Micro Devices, Inc.
  Platform Extensions:                       cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
  Platform Profile:                          FULL_PROFILE
  Platform Version:                          OpenCL 1.2 pocl 1.0, LLVM 5.0.1
  Platform Name:                             Portable Computing Language
  Platform Vendor:                           The pocl project
  Platform Extensions:                       cl_khr_icd

 

 

  Platform Name:                             AMD Accelerated Parallel Processing
Number of devices:                           3
  Device Type:                               CL_DEVICE_TYPE_GPU
  Vendor ID:                                 1002h
  Board name:                                AMD Radeon Graphics
  Device Topology:                           PCI[ B#4, D#0, F#0 ]
  Max compute units:                         36
  Max work items dimensions:                 3
Max work items[0]:                       1024
Max work items[1]:                       1024
Max work items[2]:                       1024
  Max work group size:                       256
  Preferred vector width char:               4
  Preferred vector width short:              2
  Preferred vector width int:                1
  Preferred vector width long:               1
  Preferred vector width float:              1
  Preferred vector width double:             1
  Native vector width char:                  4
  Native vector width short:                 2
  Native vector width int:                   1
  Native vector width long:                  1
  Native vector width float:                 1
  Native vector width double:                1
  Max clock frequency:                       300Mhz
  Address bits:                              64
  Max memory allocation:                     3422315315
  Image support:                             Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                        16384
  Max image 2D height:                       16384
  Max image 3D width:                        2048
  Max image 3D height:                       2048
  Max image 3D depth:                        2048
  Max samplers within kernel:                16
  Max size of kernel argument:               1024
  Alignment (bits) of base address:          2048
  Minimum alignment (bytes) for any datatype:128

 

 

  Device Type:                               CL_DEVICE_TYPE_GPU
  Vendor ID:                                 1002h
  Board name:                                AMD Radeon (TM) R9 Fury Series
  Device Topology:                           PCI[ B#66, D#0, F#0 ]
  Max compute units:                         64
  Max work items dimensions:                 3
Max work items[0]:                       1024
Max work items[1]:                       1024
Max work items[2]:                       1024
  Max work group size:                       256
  Preferred vector width char:               4
  Preferred vector width short:              2
  Preferred vector width int:                1
  Preferred vector width long:               1
  Preferred vector width float:              1
  Preferred vector width double:             1
  Native vector width char:                  4
  Native vector width short:                 2
  Native vector width int:                   1
  Native vector width long:                  1
  Native vector width float:                 1
  Native vector width double:                1
  Max clock frequency:                       300Mhz
  Address bits:                              64
  Max memory allocation:                     3422315315
  Image support:                             Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                        16384
  Max image 2D height:                       16384
  Max image 3D width:                        2048
  Max image 3D height:                       2048
  Max image 3D depth:                        2048
  Max samplers within kernel:                16
  Max size of kernel argument:               1024
  Alignment (bits) of base address:          2048
  Minimum alignment (bytes) for any datatype:128

 

 

 

  Device Type:                               CL_DEVICE_TYPE_GPU
  Vendor ID:                                 1002h
  Board name:                                AMD Radeon R9 200 Series
  Device Topology:                           PCI[ B#65, D#0, F#0 ]
  Max compute units:                         28
  Max work items dimensions:                 3
Max work items[0]:                       1024
Max work items[1]:                       1024
Max work items[2]:                       1024
  Max work group size:                       256
  Preferred vector width char:               4
  Preferred vector width short:              2
  Preferred vector width int:                1
  Preferred vector width long:               1
  Preferred vector width float:              1
  Preferred vector width double:             1
  Native vector width char:                  4
  Native vector width short:                 2
  Native vector width int:                   1
  Native vector width long:                  1
  Native vector width float:                 1
  Native vector width double:                1
  Max clock frequency:                       300Mhz
  Address bits:                              64
  Max memory allocation:                     1596954214
  Image support:                             Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                        16384
  Max image 2D height:                       16384
  Max image 3D width:                        2048
  Max image 3D height:                       2048
  Max image 3D depth:                        2048
  Max samplers within kernel:                16
  Max size of kernel argument:               1024
  Alignment (bits) of base address:          2048
  Minimum alignment (bytes) for any datatype:128

 

 

  Platform Name:                             Portable Computing Language
Number of devices:                           1
  Device Type:                               CL_DEVICE_TYPE_CPU
  Vendor ID:                                 1022h
  Max compute units:                         32
  Max work items dimensions:                 3
Max work items[0]:                       4096
Max work items[1]:                       4096
Max work items[2]:                       4096
  Max work group size:                       4096
  Preferred vector width char:               16
  Preferred vector width short:              16
  Preferred vector width int:                8
  Preferred vector width long:               4
  Preferred vector width float:              8
  Preferred vector width double:             4
  Native vector width char:                  16
  Native vector width short:                 16
  Native vector width int:                   8
  Native vector width long:                  4
  Native vector width float:                 8
  Native vector width double:                4
  Max clock frequency:                       2300Mhz
  Address bits:                              64
  Max memory allocation:                     34359738368
  Image support:                             Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      128
  Max image 2D width:                        32768
  Max image 2D height:                       32768
  Max image 3D width:                        2048
  Max image 3D height:                       2048
  Max image 3D depth:                        2048
  Max samplers within kernel:                16
  Max size of kernel argument:               1024
  Alignment (bits) of base address:          1024
  Minimum alignment (bytes) for any datatype:128

(output of clinfo, somewhat stripped)

I have a strange problem: Devices based on the GPU cause a segmentation fault while executing

This does not happen on a portable where Intel OpenCL SDK is installed, together with the NVidia drivers.

I do not say it's amd's fault, as I simply don't know.  I suspect a memory leak:

In the 'start' function of the opencl program:

[code]

Breakpoint 1, start_with_events (in_devs=0x60c4b0, amount=4) at src/openclhost.c:361

361             err = clEnqueueNDRangeKernel(in_devs[i].command_queue, in_devs[i].kernels[in_devs[i].hosting_iterator->current_char % 4], 3, NULL, globalworkids, localworkids, 0, NULL, &(in_devs[i].event));

(gdb) next

362             if(err != CL_SUCCESS) {

(gdb) next

366             err = clSetEventCallback(in_devs[i].event, CL_COMPLETE, nextevent, pass);

(gdb) next

367                     if(err != CL_SUCCESS) {

(gdb) print in_devs[i].event

$1 = (cl_event) 0x12f4440

[/code]

 

in the start new NDRangeKernel callback function of the program:

[code]

(gdb) continue

Continuing.

 

Program received signal SIGSEGV, Segmentation fault.

[Switching to Thread 0x7ffbcc45f700 (LWP 927)]

0x00007ffff766d18b in _int_malloc () from /lib64/libc.so.6

(gdb) print in_devs[i].event

No symbol "in_devs" in current context.

(gdb) up

#1  0x00007ffff766ecd8 in malloc () from /lib64/libc.so.6

(gdb) up

#2  0x00007ffff57b1e3d in ?? () from /opt/amdgpu/lib64/libamdocl64.so

(gdb) up

#3  0x00007ffff343c987 in clEnqueueNDRangeKernel () from /opt/amdgpu/lib64/libamdocl64.so

(gdb) up

#4  0x00007ffff7bc4f22 in clEnqueueNDRangeKernel () from /usr/lib64/libOpenCL.so.1

(gdb) up

#5  0x0000000000402404 in launchKernel (event=0x7fbed8310bf0, event_command_exec_status=0, user_data=0x1849be0) at src/openclhost.c:107

107             err = clEnqueueNDRangeKernel(device->command_queue, device->kernels[kernelnr], 3, NULL, gwids, lwids, 1, &(process->multiple_use), &(device->event));

(gdb) print in_devs[i].event

No symbol "in_devs" in current context.

(gdb) print device->event

$2 = (cl_event) 0x0

(gdb) print process->deviceid

$3 = 0

(gdb) bt

#0  0x00007ffff766d18b in _int_malloc () from /lib64/libc.so.6

#1  0x00007ffff766ecd8 in malloc () from /lib64/libc.so.6

#2  0x00007ffff57b1e3d in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#3  0x00007ffff343c987 in clEnqueueNDRangeKernel () from /opt/amdgpu/lib64/libamdocl64.so

#4  0x00007ffff7bc4f22 in clEnqueueNDRangeKernel () from /usr/lib64/libOpenCL.so.1

#5  0x0000000000402404 in launchKernel (event=0x7fbed8310bf0, event_command_exec_status=0, user_data=0x1849be0) at src/openclhost.c:107

#6  0x00007ffff3465619 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#7  0x00007ffff34656c2 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#8  0x00007ffff34b64f8 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#9  0x00007ffff34b6756 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#10 0x00007ffff34682e6 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#11 0x00007ffff346898d in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#12 0x00007ffff33fd67f in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#13 0x00007ffff347998c in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#14 0x00007ffff79a28c7 in start_thread () from /lib64/libpthread.so.0

#15 0x00007ffff76e348f in clone () from /lib64/libc.so.6

(gdb)

[/code]

 

any ideas where an opencl event is set to 0?

I also tried valgrind, but it crashes with an illegal instruction.  No idea how to handle this...

any ideas?

Outcomes