GPU debugging, i.e. stepping into kernels, does not work on Ubuntu 13.10/CodeXL 1.5 (see the screenshot above). Here my setup:
- Ubuntu 13.10 x86_64;
- GPU AMD R9 280X;
- graphics drivers 14.4 installed using the official installer;
- CodeXL 1.5 installed using the .deb package.
I've tried different icd loaders, dinamically linking the corresponding libOpenCL.so:
- using the icd loader provided by the AMD drivers or by the AMD APP SDK (v2.9.1) the OpenCL breakpoints are totally ignored, that is the execution continues without stopping on breakpoints;
- using the ocl-icd provided by the official Ubuntu repositories (https://forge.imag.fr/projects/ocl-icd/) the execution stops on breakpoints, but the debugger is not able to skip into kernels and an error windows with the previously shown error is opened. The following message gets also shown twice on the standard output:
/home/daniele/ieiit/workspace/hand_pose_recognition/branches/new_rf_optimized/random_forests/build/test_tree_trainer: /opt/AMD/CodeXL_1.5-5364/spies/libOpenCL.so.1: no version information available (required by /home/daniele/ieiit/workspace/hand_pose_recognition/branches/new_rf_optimized/random_forests/build/test_tree_trainer)
You can find the CodeXL log files attached.
Thanks for your help,
Similar issues to case (2) here. Same error message on the shell but no GUI error message.
I can definitely run the example problem and step into kernel code. However CodeXL freezes when I do the same with my application. If I let it run, it unfreezes but it's not possible to stop the debugging process.
I am running Ubuntu 14.04 with the latest AMD drivers (Catalyst 14.4 rev2) and OpenCL SDK (1214.3) on an AMD Cedar GPU. Everything was installed from the tarballs.
I managed to at least hit breakpoints while linking against the libOpenCL provided by the drivers. As described in the release notes, "GPU Debugger breakpoints are not hit for Linux OpenCL applications that are built using DT_RPATH to locate the OpenCL runtime.". It turns out that's exactly my case since the output of readelf -d is the following:
Dynamic section at offset 0x77d98 contains 33 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libopencv_core.so.2.4]
0x0000000000000001 (NEEDED) Shared library: [libopencv_highgui.so.2.4]
0x0000000000000001 (NEEDED) Shared library: [libopencv_imgproc.so.2.4]
0x0000000000000001 (NEEDED) Shared library: [libboost_system.so.1.53.0]
0x0000000000000001 (NEEDED) Shared library: [libboost_filesystem.so.1.53.0]
0x0000000000000001 (NEEDED) Shared library: [libOpenCL.so.1]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000f (RPATH) Library rpath: [/usr/lib/fglrx:/home/daniele/ieiit/workspace/hand_pose_recognition/branches/new_rf_optimized/random_forests]
0x000000000000000c (INIT) 0x441638
0x000000000000000d (FINI) 0x464674
0x0000000000000019 (INIT_ARRAY) 0x677d78
0x000000000000001b (INIT_ARRAYSZ) 16 (bytes)
0x000000000000001a (FINI_ARRAY) 0x677d88
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x400298
0x0000000000000005 (STRTAB) 0x40fdc8
0x0000000000000006 (SYMTAB) 0x403918
0x000000000000000a (STRSZ) 194202 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x678000
0x0000000000000002 (PLTRELSZ) 3888 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x440708
0x0000000000000007 (RELA) 0x4405b8
0x0000000000000008 (RELASZ) 336 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffe (VERNEED) 0x4404c8
0x000000006fffffff (VERNEEDNUM) 5
0x000000006ffffff0 (VERSYM) 0x43f462
0x0000000000000000 (NULL) 0x0
Removing the RPATH section with the tool chrpath allows the debugger to hit the OpenCL API breakpoints, either way I'm still not able to skip into kernels code (same error message of the first post).
1. Beat me to the RPATH part, I'll just add that you can also avoid this issue entirely (without resorting to chrpath) by adding "--enable-new-dtags" to your own application's linker flags.
This is actually recommended in general, since DT_RPATH is deprecated and DT_RUNPATH is the replacement.
2. Regarding debugging the kernel itself, I could not find any telling hints in the logs.
2A. Are you able to debug the kernels in the SDK samples? The CodeXL teapot sample?
2B. If the other samples work, it might be that the issue is with the kernel itself not being debuggable (the message you've shown in your screenshot is the generic "something went wrong" message for that case, so it's hard to tell what happened). Common culprits are atomic operations and printf. Could you share the kernel sources?
Uri Shomroni thank you for the RPATH hint. Actually it gets added by default using CMake each time I specify a linking path for the OpenCL shared library ...
Regarding the kernel debugging, I'm able to step into kernels when running the Teapot sample. I've also tried another program developed in the past months (e.g. the classifier.cl attached to the message) and it can be debugged with no issues. The kernels code that is causing the problem is attached to the message. The feat_type.cl file is dinamically generated, saved on the /tmp directory and included using the -I compilation flag. There're no printfs or atomic operations. Note that all the CL sources can be debugged using the older 1.3 CodeXL release with the driver 13.101 packaged in the Ubuntu repositories. Hope this stuff could help to track the problem.
Thanks for sharing your sources.
We will try to reconstruct the issue in our labs.
In the meantime, try commenting-out parts of the kernel to see if any specific area is the culprit, it might also clue you (and us) in to where the issue lies.