cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

jpsollie
Adept II

opencl segmentation fault in callback event while launching clEnqueueNDRangeKernel()

hello everyone,

the history of my program is somewhat strange, so I hope you guys will not be mad why I am not posting the full code:

for my marriage, I got a PDF which was 256-bit encoded, and the password is SHA-256 encrypted. the job was to decrypt it. the password is >7chars and contains basic special characters ...

I then decided to build a powerful opencl brute-force opencl system:

host (iterates over max_strlen - 6 characters)-> client(iterates over 2 characters) -> OpenCL device (iterates over 4 characters, one on the host, 3 on the device)

every -> means that multiple connections are possible.  The -> using host and client is built using TCP/IPv4

the server part works as designed.

So does the OpenCL code.

the client code does what it should do.  I already have tried to decode the string "aaaaaa" and it found a solution on openCL device 0 and reported it to the server.

now, the problem:

on this server which contains the following hardware:

Number of platforms:                         2
  Platform Profile:                          FULL_PROFILE
  Platform Version:                          OpenCL 2.1 AMD-APP (2527.3)
  Platform Name:                             AMD Accelerated Parallel Processing
  Platform Vendor:                           Advanced Micro Devices, Inc.
  Platform Extensions:                       cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
  Platform Profile:                          FULL_PROFILE
  Platform Version:                          OpenCL 1.2 pocl 1.0, LLVM 5.0.1
  Platform Name:                             Portable Computing Language
  Platform Vendor:                           The pocl project
  Platform Extensions:                       cl_khr_icd

  Platform Name:                             AMD Accelerated Parallel Processing
Number of devices:                           3
  Device Type:                               CL_DEVICE_TYPE_GPU
  Vendor ID:                                 1002h
  Board name:                                AMD Radeon Graphics
  Device Topology:                           PCI[ B#4, D#0, F#0 ]
  Max compute units:                         36
  Max work items dimensions:                 3
Max work items[0]:                       1024
Max work items[1]:                       1024
Max work items[2]:                       1024
  Max work group size:                       256
  Preferred vector width char:               4
  Preferred vector width short:              2
  Preferred vector width int:                1
  Preferred vector width long:               1
  Preferred vector width float:              1
  Preferred vector width double:             1
  Native vector width char:                  4
  Native vector width short:                 2
  Native vector width int:                   1
  Native vector width long:                  1
  Native vector width float:                 1
  Native vector width double:                1
  Max clock frequency:                       300Mhz
  Address bits:                              64
  Max memory allocation:                     3422315315
  Image support:                             Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                        16384
  Max image 2D height:                       16384
  Max image 3D width:                        2048
  Max image 3D height:                       2048
  Max image 3D depth:                        2048
  Max samplers within kernel:                16
  Max size of kernel argument:               1024
  Alignment (bits) of base address:          2048
  Minimum alignment (bytes) for any datatype:128

 

  Device Type:                               CL_DEVICE_TYPE_GPU
  Vendor ID:                                 1002h
  Board name:                                AMD Radeon (TM) R9 Fury Series
  Device Topology:                           PCI[ B#66, D#0, F#0 ]
  Max compute units:                         64
  Max work items dimensions:                 3
Max work items[0]:                       1024
Max work items[1]:                       1024
Max work items[2]:                       1024
  Max work group size:                       256
  Preferred vector width char:               4
  Preferred vector width short:              2
  Preferred vector width int:                1
  Preferred vector width long:               1
  Preferred vector width float:              1
  Preferred vector width double:             1
  Native vector width char:                  4
  Native vector width short:                 2
  Native vector width int:                   1
  Native vector width long:                  1
  Native vector width float:                 1
  Native vector width double:                1
  Max clock frequency:                       300Mhz
  Address bits:                              64
  Max memory allocation:                     3422315315
  Image support:                             Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                        16384
  Max image 2D height:                       16384
  Max image 3D width:                        2048
  Max image 3D height:                       2048
  Max image 3D depth:                        2048
  Max samplers within kernel:                16
  Max size of kernel argument:               1024
  Alignment (bits) of base address:          2048
  Minimum alignment (bytes) for any datatype:128

 

  Device Type:                               CL_DEVICE_TYPE_GPU
  Vendor ID:                                 1002h
  Board name:                                AMD Radeon R9 200 Series
  Device Topology:                           PCI[ B#65, D#0, F#0 ]
  Max compute units:                         28
  Max work items dimensions:                 3
Max work items[0]:                       1024
Max work items[1]:                       1024
Max work items[2]:                       1024
  Max work group size:                       256
  Preferred vector width char:               4
  Preferred vector width short:              2
  Preferred vector width int:                1
  Preferred vector width long:               1
  Preferred vector width float:              1
  Preferred vector width double:             1
  Native vector width char:                  4
  Native vector width short:                 2
  Native vector width int:                   1
  Native vector width long:                  1
  Native vector width float:                 1
  Native vector width double:                1
  Max clock frequency:                       300Mhz
  Address bits:                              64
  Max memory allocation:                     1596954214
  Image support:                             Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                        16384
  Max image 2D height:                       16384
  Max image 3D width:                        2048
  Max image 3D height:                       2048
  Max image 3D depth:                        2048
  Max samplers within kernel:                16
  Max size of kernel argument:               1024
  Alignment (bits) of base address:          2048
  Minimum alignment (bytes) for any datatype:128

  Platform Name:                             Portable Computing Language
Number of devices:                           1
  Device Type:                               CL_DEVICE_TYPE_CPU
  Vendor ID:                                 1022h
  Max compute units:                         32
  Max work items dimensions:                 3
Max work items[0]:                       4096
Max work items[1]:                       4096
Max work items[2]:                       4096
  Max work group size:                       4096
  Preferred vector width char:               16
  Preferred vector width short:              16
  Preferred vector width int:                8
  Preferred vector width long:               4
  Preferred vector width float:              8
  Preferred vector width double:             4
  Native vector width char:                  16
  Native vector width short:                 16
  Native vector width int:                   8
  Native vector width long:                  4
  Native vector width float:                 8
  Native vector width double:                4
  Max clock frequency:                       2300Mhz
  Address bits:                              64
  Max memory allocation:                     34359738368
  Image support:                             Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      128
  Max image 2D width:                        32768
  Max image 2D height:                       32768
  Max image 3D width:                        2048
  Max image 3D height:                       2048
  Max image 3D depth:                        2048
  Max samplers within kernel:                16
  Max size of kernel argument:               1024
  Alignment (bits) of base address:          1024
  Minimum alignment (bytes) for any datatype:128

(output of clinfo, somewhat stripped)

I have a strange problem: Devices based on the GPU cause a segmentation fault while executing

This does not happen on a portable where Intel OpenCL SDK is installed, together with the NVidia drivers.

I do not say it's amd's fault, as I simply don't know.  I suspect a memory leak:

In the 'start' function of the opencl program:

Breakpoint 1, start_with_events (in_devs=0x60c4b0, amount=4) at src/openclhost.c:361

361             err = clEnqueueNDRangeKernel(in_devs.command_queue, in_devs.kernels[in_devs.hosting_iterator->current_char % 4], 3, NULL, globalworkids, localworkids, 0, NULL, &(in_devs.event));

(gdb) next

362             if(err != CL_SUCCESS) {

(gdb) next

366             err = clSetEventCallback(in_devs.event, CL_COMPLETE, nextevent, pass);

(gdb) next

367                     if(err != CL_SUCCESS) {

(gdb) print in_devs.event

$1 = (cl_event) 0x12f4440

in the start new NDRangeKernel callback function of the program:

(gdb) continue

Continuing.

Program received signal SIGSEGV, Segmentation fault.

[Switching to Thread 0x7ffbcc45f700 (LWP 927)]

0x00007ffff766d18b in _int_malloc () from /lib64/libc.so.6

(gdb) print in_devs.event

No symbol "in_devs" in current context.

(gdb) up

#1  0x00007ffff766ecd8 in malloc () from /lib64/libc.so.6

(gdb) up

#2  0x00007ffff57b1e3d in ?? () from /opt/amdgpu/lib64/libamdocl64.so

(gdb) up

#3  0x00007ffff343c987 in clEnqueueNDRangeKernel () from /opt/amdgpu/lib64/libamdocl64.so

(gdb) up

#4  0x00007ffff7bc4f22 in clEnqueueNDRangeKernel () from /usr/lib64/libOpenCL.so.1

(gdb) up

#5  0x0000000000402404 in launchKernel (event=0x7fbed8310bf0, event_command_exec_status=0, user_data=0x1849be0) at src/openclhost.c:107

107             err = clEnqueueNDRangeKernel(device->command_queue, device->kernels[kernelnr], 3, NULL, gwids, lwids, 1, &(process->multiple_use), &(device->event));

(gdb) print in_devs.event

No symbol "in_devs" in current context.

(gdb) print device->event

$2 = (cl_event) 0x0

(gdb) print process->deviceid

$3 = 0

(gdb) bt

#0  0x00007ffff766d18b in _int_malloc () from /lib64/libc.so.6

#1  0x00007ffff766ecd8 in malloc () from /lib64/libc.so.6

#2  0x00007ffff57b1e3d in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#3  0x00007ffff343c987 in clEnqueueNDRangeKernel () from /opt/amdgpu/lib64/libamdocl64.so

#4  0x00007ffff7bc4f22 in clEnqueueNDRangeKernel () from /usr/lib64/libOpenCL.so.1

#5  0x0000000000402404 in launchKernel (event=0x7fbed8310bf0, event_command_exec_status=0, user_data=0x1849be0) at src/openclhost.c:107

#6  0x00007ffff3465619 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#7  0x00007ffff34656c2 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#8  0x00007ffff34b64f8 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#9  0x00007ffff34b6756 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#10 0x00007ffff34682e6 in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#11 0x00007ffff346898d in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#12 0x00007ffff33fd67f in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#13 0x00007ffff347998c in ?? () from /opt/amdgpu/lib64/libamdocl64.so

#14 0x00007ffff79a28c7 in start_thread () from /lib64/libpthread.so.0

#15 0x00007ffff76e348f in clone () from /lib64/libc.so.6

(gdb)

any ideas where an opencl event is set to 0?

I also tried valgrind, but it crashes with an illegal instruction.  No idea how to handle this...

any ideas?

0 Likes
1 Solution
jpsollie
Adept II

I found the answer:

the event object was declared in function memory.  When using malloc() instead of statically assigned memory, everything was fine.

sorry for the long post

View solution in original post

0 Likes
1 Reply
jpsollie
Adept II

I found the answer:

the event object was declared in function memory.  When using malloc() instead of statically assigned memory, everything was fine.

sorry for the long post

0 Likes