hello everyone,
the history of my program is somewhat strange, so I hope you guys will not be mad why I am not posting the full code:
for my marriage, I got a PDF which was 256-bit encoded, and the password is SHA-256 encrypted. the job was to decrypt it. the password is >7chars and contains basic special characters ...
I then decided to build a powerful opencl brute-force opencl system:
host (iterates over max_strlen - 6 characters)-> client(iterates over 2 characters) -> OpenCL device (iterates over 4 characters, one on the host, 3 on the device)
every -> means that multiple connections are possible. The -> using host and client is built using TCP/IPv4
the server part works as designed.
So does the OpenCL code.
the client code does what it should do. I already have tried to decode the string "aaaaaa" and it found a solution on openCL device 0 and reported it to the server.
now, the problem:
on this server which contains the following hardware:
Number of platforms: | 2 |
Platform Profile: | FULL_PROFILE |
Platform Version: | OpenCL 2.1 AMD-APP (2527.3) |
Platform Name: | AMD Accelerated Parallel Processing |
Platform Vendor: | Advanced Micro Devices, Inc. |
Platform Extensions: | cl_khr_icd cl_amd_event_callback cl_amd_offline_devices |
Platform Profile: | FULL_PROFILE |
Platform Version: | OpenCL 1.2 pocl 1.0, LLVM 5.0.1 |
Platform Name: | Portable Computing Language |
Platform Vendor: | The pocl project |
Platform Extensions: | cl_khr_icd |
Platform Name: | AMD Accelerated Parallel Processing | |
Number of devices: | 3 | |
Device Type: | CL_DEVICE_TYPE_GPU | |
Vendor ID: | 1002h | |
Board name: | AMD Radeon Graphics | |
Device Topology: | PCI[ B#4, D#0, F#0 ] | |
Max compute units: | 36 | |
Max work items dimensions: | 3 | |
Max work items[0]: | 1024 | |
Max work items[1]: | 1024 | |
Max work items[2]: | 1024 | |
Max work group size: | 256 | |
Preferred vector width char: | 4 | |
Preferred vector width short: | 2 | |
Preferred vector width int: | 1 | |
Preferred vector width long: | 1 | |
Preferred vector width float: | 1 | |
Preferred vector width double: | 1 | |
Native vector width char: | 4 | |
Native vector width short: | 2 | |
Native vector width int: | 1 | |
Native vector width long: | 1 | |
Native vector width float: | 1 | |
Native vector width double: | 1 | |
Max clock frequency: | 300Mhz | |
Address bits: | 64 | |
Max memory allocation: | 3422315315 | |
Image support: | Yes | |
Max number of images read arguments: | 128 | |
Max number of images write arguments: | 8 | |
Max image 2D width: | 16384 | |
Max image 2D height: | 16384 | |
Max image 3D width: | 2048 | |
Max image 3D height: | 2048 | |
Max image 3D depth: | 2048 | |
Max samplers within kernel: | 16 | |
Max size of kernel argument: | 1024 | |
Alignment (bits) of base address: | 2048 | |
Minimum alignment (bytes) for any datatype: | 128 |
Device Type: | CL_DEVICE_TYPE_GPU | |
Vendor ID: | 1002h | |
Board name: | AMD Radeon (TM) R9 Fury Series | |
Device Topology: | PCI[ B#66, D#0, F#0 ] | |
Max compute units: | 64 | |
Max work items dimensions: | 3 | |
Max work items[0]: | 1024 | |
Max work items[1]: | 1024 | |
Max work items[2]: | 1024 | |
Max work group size: | 256 | |
Preferred vector width char: | 4 | |
Preferred vector width short: | 2 | |
Preferred vector width int: | 1 | |
Preferred vector width long: | 1 | |
Preferred vector width float: | 1 | |
Preferred vector width double: | 1 | |
Native vector width char: | 4 | |
Native vector width short: | 2 | |
Native vector width int: | 1 | |
Native vector width long: | 1 | |
Native vector width float: | 1 | |
Native vector width double: | 1 | |
Max clock frequency: | 300Mhz | |
Address bits: | 64 | |
Max memory allocation: | 3422315315 | |
Image support: | Yes | |
Max number of images read arguments: | 128 | |
Max number of images write arguments: | 8 | |
Max image 2D width: | 16384 | |
Max image 2D height: | 16384 | |
Max image 3D width: | 2048 | |
Max image 3D height: | 2048 | |
Max image 3D depth: | 2048 | |
Max samplers within kernel: | 16 | |
Max size of kernel argument: | 1024 | |
Alignment (bits) of base address: | 2048 | |
Minimum alignment (bytes) for any datatype: | 128 |
Device Type: | CL_DEVICE_TYPE_GPU | |
Vendor ID: | 1002h | |
Board name: | AMD Radeon R9 200 Series | |
Device Topology: | PCI[ B#65, D#0, F#0 ] | |
Max compute units: | 28 | |
Max work items dimensions: | 3 | |
Max work items[0]: | 1024 | |
Max work items[1]: | 1024 | |
Max work items[2]: | 1024 | |
Max work group size: | 256 | |
Preferred vector width char: | 4 | |
Preferred vector width short: | 2 | |
Preferred vector width int: | 1 | |
Preferred vector width long: | 1 | |
Preferred vector width float: | 1 | |
Preferred vector width double: | 1 | |
Native vector width char: | 4 | |
Native vector width short: | 2 | |
Native vector width int: | 1 | |
Native vector width long: | 1 | |
Native vector width float: | 1 | |
Native vector width double: | 1 | |
Max clock frequency: | 300Mhz | |
Address bits: | 64 | |
Max memory allocation: | 1596954214 | |
Image support: | Yes | |
Max number of images read arguments: | 128 | |
Max number of images write arguments: | 8 | |
Max image 2D width: | 16384 | |
Max image 2D height: | 16384 | |
Max image 3D width: | 2048 | |
Max image 3D height: | 2048 | |
Max image 3D depth: | 2048 | |
Max samplers within kernel: | 16 | |
Max size of kernel argument: | 1024 | |
Alignment (bits) of base address: | 2048 | |
Minimum alignment (bytes) for any datatype: | 128 |
Platform Name: | Portable Computing Language | |
Number of devices: | 1 | |
Device Type: | CL_DEVICE_TYPE_CPU | |
Vendor ID: | 1022h | |
Max compute units: | 32 | |
Max work items dimensions: | 3 | |
Max work items[0]: | 4096 | |
Max work items[1]: | 4096 | |
Max work items[2]: | 4096 | |
Max work group size: | 4096 | |
Preferred vector width char: | 16 | |
Preferred vector width short: | 16 | |
Preferred vector width int: | 8 | |
Preferred vector width long: | 4 | |
Preferred vector width float: | 8 | |
Preferred vector width double: | 4 | |
Native vector width char: | 16 | |
Native vector width short: | 16 | |
Native vector width int: | 8 | |
Native vector width long: | 4 | |
Native vector width float: | 8 | |
Native vector width double: | 4 | |
Max clock frequency: | 2300Mhz | |
Address bits: | 64 | |
Max memory allocation: | 34359738368 | |
Image support: | Yes | |
Max number of images read arguments: | 128 | |
Max number of images write arguments: | 128 | |
Max image 2D width: | 32768 | |
Max image 2D height: | 32768 | |
Max image 3D width: | 2048 | |
Max image 3D height: | 2048 | |
Max image 3D depth: | 2048 | |
Max samplers within kernel: | 16 | |
Max size of kernel argument: | 1024 | |
Alignment (bits) of base address: | 1024 | |
Minimum alignment (bytes) for any datatype: | 128 |
(output of clinfo, somewhat stripped)
I have a strange problem: Devices based on the GPU cause a segmentation fault while executing
This does not happen on a portable where Intel OpenCL SDK is installed, together with the NVidia drivers.
I do not say it's amd's fault, as I simply don't know. I suspect a memory leak:
In the 'start' function of the opencl program:
Breakpoint 1, start_with_events (in_devs=0x60c4b0, amount=4) at src/openclhost.c:361
361 err = clEnqueueNDRangeKernel(in_devs.command_queue, in_devs.kernels[in_devs.hosting_iterator->current_char % 4], 3, NULL, globalworkids, localworkids, 0, NULL, &(in_devs.event));
(gdb) next
362 if(err != CL_SUCCESS) {
(gdb) next
366 err = clSetEventCallback(in_devs.event, CL_COMPLETE, nextevent, pass);
(gdb) next
367 if(err != CL_SUCCESS) {
(gdb) print in_devs.event
$1 = (cl_event) 0x12f4440
in the start new NDRangeKernel callback function of the program:
(gdb) continue
Continuing.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffbcc45f700 (LWP 927)]
0x00007ffff766d18b in _int_malloc () from /lib64/libc.so.6
(gdb) print in_devs.event
No symbol "in_devs" in current context.
(gdb) up
#1 0x00007ffff766ecd8 in malloc () from /lib64/libc.so.6
(gdb) up
#2 0x00007ffff57b1e3d in ?? () from /opt/amdgpu/lib64/libamdocl64.so
(gdb) up
#3 0x00007ffff343c987 in clEnqueueNDRangeKernel () from /opt/amdgpu/lib64/libamdocl64.so
(gdb) up
#4 0x00007ffff7bc4f22 in clEnqueueNDRangeKernel () from /usr/lib64/libOpenCL.so.1
(gdb) up
#5 0x0000000000402404 in launchKernel (event=0x7fbed8310bf0, event_command_exec_status=0, user_data=0x1849be0) at src/openclhost.c:107
107 err = clEnqueueNDRangeKernel(device->command_queue, device->kernels[kernelnr], 3, NULL, gwids, lwids, 1, &(process->multiple_use), &(device->event));
(gdb) print in_devs.event
No symbol "in_devs" in current context.
(gdb) print device->event
$2 = (cl_event) 0x0
(gdb) print process->deviceid
$3 = 0
(gdb) bt
#0 0x00007ffff766d18b in _int_malloc () from /lib64/libc.so.6
#1 0x00007ffff766ecd8 in malloc () from /lib64/libc.so.6
#2 0x00007ffff57b1e3d in ?? () from /opt/amdgpu/lib64/libamdocl64.so
#3 0x00007ffff343c987 in clEnqueueNDRangeKernel () from /opt/amdgpu/lib64/libamdocl64.so
#4 0x00007ffff7bc4f22 in clEnqueueNDRangeKernel () from /usr/lib64/libOpenCL.so.1
#5 0x0000000000402404 in launchKernel (event=0x7fbed8310bf0, event_command_exec_status=0, user_data=0x1849be0) at src/openclhost.c:107
#6 0x00007ffff3465619 in ?? () from /opt/amdgpu/lib64/libamdocl64.so
#7 0x00007ffff34656c2 in ?? () from /opt/amdgpu/lib64/libamdocl64.so
#8 0x00007ffff34b64f8 in ?? () from /opt/amdgpu/lib64/libamdocl64.so
#9 0x00007ffff34b6756 in ?? () from /opt/amdgpu/lib64/libamdocl64.so
#10 0x00007ffff34682e6 in ?? () from /opt/amdgpu/lib64/libamdocl64.so
#11 0x00007ffff346898d in ?? () from /opt/amdgpu/lib64/libamdocl64.so
#12 0x00007ffff33fd67f in ?? () from /opt/amdgpu/lib64/libamdocl64.so
#13 0x00007ffff347998c in ?? () from /opt/amdgpu/lib64/libamdocl64.so
#14 0x00007ffff79a28c7 in start_thread () from /lib64/libpthread.so.0
#15 0x00007ffff76e348f in clone () from /lib64/libc.so.6
(gdb)
any ideas where an opencl event is set to 0?
I also tried valgrind, but it crashes with an illegal instruction. No idea how to handle this...
any ideas?
Solved! Go to Solution.
I found the answer:
the event object was declared in function memory. When using malloc() instead of statically assigned memory, everything was fine.
sorry for the long post
I found the answer:
the event object was declared in function memory. When using malloc() instead of statically assigned memory, everything was fine.
sorry for the long post