cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jsnyder
Adept I

Can't Debug Kernel in Teapot Example

I have an HP Z820 configured width an Intel Xeon E5-2650, 32GB RAM, and an msi Radeon R9 280X Gaming 6G card. I am running Windows 8.1 Pro, and have Visual Studio Premium 2013 Update 3, AMD Catalyst Version 14.9(Driver 14.301.1001-140915a-176154C), AMD APP SDK 2.9-1, and Code XL 1.5.6571 installed.

I am trying to debug a kernel in the AMD Teapot example. I compile the AMDTTeaPot project in Debug Configuration. When I run without any break points, the application runs at 127 FPS.

When I put a breakpoint in the kernel applyBuoyancy() by clicking to the left of line 20 in tpApplyBuoyancy.cl, the application slows down to 23 FPS and I observe a lot of pairs of threads being created and then destroyed. The application does not break.

If I remove that breakpoint, continue, and use the OpenXL menu to put a New CodeXL Breakpoint in the Kernel Function applyBouyancy(), the application breaks just before the call to _clEnqueueReleaseGLObjects() on line 2303 of AMDTTeapotOCLSmokeSystem.cpp. Stepping does not move the cursor.

If I remove the previous breakpoint, continue, and add a New CodeXL Breakpoint in clEnqueueNDRangeKernel(), the application breaks just before the call to _clEnqueueNDRangeKernel() on line 2112 of AMDTTeapotOCLSmokeSystem.cpp. Attempting to step brings up a dialog box, which says

The Process was suspended before a kernel enqueued for debug has started executing.

Disable all API function breakpoints and resume debugging(F5) to continue into the kernel.

After deleting all breakpoints and hitting F5, a dialog box appears with the message,

Could not debug kernel. Error during kernel debugging.

A coworkwer with a similar setup, but different (R9 290X) card, has no trouble putting a breakpoint in applyBouyancy(). His version of Visual Studio appears to load the same DLLs as mine does during program invocation. My "CodeXLServers-<userid>.log" file is the same as his up until the very end of his file. The last line in agreement between the two files looks like this:

2014.11.26    09:49:48.828    #14418177828    #ERROR    #0    #3836    #gsSamplersMonitor::updateContextDataSnapshot    #src

\gsSamplersMonitor.cpp    #291    #Assertion failure (m_glGetSamplerParameteriv != 0 && m_glGetSamplerParameterfv != 0)

After that, my log file contains a large number of repetitions of this error:

2014.11.26    09:49:48.287    #14418360287    #ERROR    #0    #4268    #csDWARFParser::getAddressScope    #src\csDWARFParser.cpp    #2926   

#Assertion failure (pAddressScope != 0)

Can you help me figure out what is going on?

Message was edited by: Jeffrey Snyder

I corrected my comment about the behavior when breaking on clEnqueueNDRangeKernel(). I had mistakenly said that the application breaks on _clEnqueueReleaseGLObjects(), but that was a copy/paste error from the previous case.

0 Likes
6 Replies
dorono
Staff
Staff

Re: Can't Debug Kernel in Teapot Example

Hi Jeffrey,

Can you run clinfo on your machine and upload the results?

Thanks

0 Likes
jsnyder
Adept I

Re: Can't Debug Kernel in Teapot Example

Thanks for offering to help. This is what I get by running C:\Windows\System32\clinfo.exe:

C:\Windows\System32>clinfo

Number of platforms:                             2

  Platform Profile:                              FULL_PROFILE

  Platform Version:                              OpenCL 1.1 CUDA 6.0.1

  Platform Name:                                 NVIDIA CUDA

  Platform Vendor:                               NVIDIA Corporation

  Platform Extensions:                           cl_khr_byte_addressable_store c

l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_

sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query

cl_nv_pragma_unroll

  Platform Profile:                              FULL_PROFILE

  Platform Version:                              OpenCL 1.2 AMD-APP (1573.4)

  Platform Name:                                 AMD Accelerated Parallel Proces

sing

  Platform Vendor:                               Advanced Micro Devices, Inc.

  Platform Extensions:                           cl_khr_icd cl_khr_d3d10_sharing

cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offl

ine_devices

  Platform Name:                                 NVIDIA CUDA

Number of devices:                               1

  Device Type:                                   CL_DEVICE_TYPE_GPU

  Vendor ID:                                     10deh

  Max compute units:                             1

  Max work items dimensions:                     3

    Max work items[0]:                           1024

    Max work items[1]:                           1024

    Max work items[2]:                           64

  Max work group size:                           1024

  Preferred vector width char:                   1

  Preferred vector width short:                  1

  Preferred vector width int:                    1

  Preferred vector width long:                   1

  Preferred vector width float:                  1

  Preferred vector width double:                 1

  Native vector width char:                      1

  Native vector width short:                     1

  Native vector width int:                       1

  Native vector width long:                      1

  Native vector width float:                     1

  Native vector width double:                    1

  Max clock frequency:                           875Mhz

  Address bits:                                  32

  Max memory allocation:                         268435456

  Image support:                                 Yes

  Max number of images read arguments:           256

  Max number of images write arguments:          16

  Max image 2D width:                            32768

  Max image 2D height:                           32768

  Max image 3D width:                            4096

  Max image 3D height:                           4096

  Max image 3D depth:                            4096

  Max samplers within kernel:                    32

  Max size of kernel argument:                   4352

  Alignment (bits) of base address:              4096

  Minimum alignment (bytes) for any datatype:    128

  Single precision floating point capability

    Denorms:                                     Yes

    Quiet NaNs:                                  Yes

    Round to nearest even:                       Yes

    Round to zero:                               Yes

    Round to +ve and infinity:                   Yes

    IEEE754-2008 fused multiply-add:             Yes

  Cache type:                                    Read/Write

  Cache line size:                               128

  Cache size:                                    16384

  Global memory size:                            1073741824

  Constant buffer size:                          65536

  Max number of constant args:                   9

  Local memory type:                             Scratchpad

  Local memory size:                             49152

  Kernel Preferred work group size multiple:     32

  Error correction support:                      0

  Unified memory for Host and Device:            0

  Profiling timer resolution:                    1000

  Device endianess:                              Little

  Available:                                     Yes

  Compiler available:                            Yes

  Execution capabilities:

    Execute OpenCL kernels:                      Yes

    Execute native function:                     No

  Queue properties:

    Out-of-Order:                                Yes

    Profiling :                                  Yes

  Platform ID:                                   000000CE6A7EEEA0

  Name:                                          Quadro K600

  Vendor:                                        NVIDIA Corporation

  Device OpenCL C version:                       OpenCL C 1.1

  Driver version:                                332.50

  Profile:                                       FULL_PROFILE

  Version:                                       OpenCL 1.1 CUDA

  Extensions:                                    cl_khr_byte_addressable_store c

l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_

sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query

cl_nv_pragma_unroll  cl_khr_global_int32_base_atomics cl_khr_global_int32_extend

ed_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics c

l_khr_fp64

  Platform Name:                                 AMD Accelerated Parallel Proces

sing

Number of devices:                               2

  Device Type:                                   CL_DEVICE_TYPE_GPU

  Vendor ID:                                     1002h

  Board name:                                    AMD Radeon R9 200 Series

  Device Topology:                               PCI[ B#5, D#0, F#0 ]

  Max compute units:                             32

  Max work items dimensions:                     3

    Max work items[0]:                           256

    Max work items[1]:                           256

    Max work items[2]:                           256

  Max work group size:                           256

  Preferred vector width char:                   4

  Preferred vector width short:                  2

  Preferred vector width int:                    1

  Preferred vector width long:                   1

  Preferred vector width float:                  1

  Preferred vector width double:                 1

  Native vector width char:                      4

  Native vector width short:                     2

  Native vector width int:                       1

  Native vector width long:                      1

  Native vector width float:                     1

  Native vector width double:                    1

  Max clock frequency:                           1020Mhz

  Address bits:                                  64

  Max memory allocation:                         4244635648

  Image support:                                 Yes

  Max number of images read arguments:           128

  Max number of images write arguments:          8

  Max image 2D width:                            16384

  Max image 2D height:                           16384

  Max image 3D width:                            2048

  Max image 3D height:                           2048

  Max image 3D depth:                            2048

  Max samplers within kernel:                    16

  Max size of kernel argument:                   1024

  Alignment (bits) of base address:              2048

  Minimum alignment (bytes) for any datatype:    128

  Single precision floating point capability

    Denorms:                                     No

    Quiet NaNs:                                  Yes

    Round to nearest even:                       Yes

    Round to zero:                               Yes

    Round to +ve and infinity:                   Yes

    IEEE754-2008 fused multiply-add:             Yes

  Cache type:                                    Read/Write

  Cache line size:                               64

  Cache size:                                    16384

  Global memory size:                            6442450944

  Constant buffer size:                          65536

  Max number of constant args:                   8

  Local memory type:                             Scratchpad

  Local memory size:                             32768

  Kernel Preferred work group size multiple:     64

  Error correction support:                      0

  Unified memory for Host and Device:            0

  Profiling timer resolution:                    1

  Device endianess:                              Little

  Available:                                     Yes

  Compiler available:                            Yes

  Execution capabilities:

    Execute OpenCL kernels:                      Yes

    Execute native function:                     No

  Queue properties:

    Out-of-Order:                                No

    Profiling :                                  Yes

  Platform ID:                                   00007FF9D5D5BFB0

  Name:                                          Tahiti

  Vendor:                                        Advanced Micro Devices, Inc.

  Device OpenCL C version:                       OpenCL C 1.2

  Driver version:                                1573.4 (VM)

  Profile:                                       FULL_PROFILE

  Version:                                       OpenCL 1.2 AMD-APP (1573.4)

  Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_

global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3

2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_

khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store

cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd

_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d1

0_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buff

er cl_khr_spir cl_khr_gl_event

  Device Type:                                   CL_DEVICE_TYPE_CPU

  Vendor ID:                                     1002h

  Board name:

  Max compute units:                             16

  Max work items dimensions:                     3

    Max work items[0]:                           1024

    Max work items[1]:                           1024

    Max work items[2]:                           1024

  Max work group size:                           1024

  Preferred vector width char:                   16

  Preferred vector width short:                  8

  Preferred vector width int:                    4

  Preferred vector width long:                   2

  Preferred vector width float:                  8

  Preferred vector width double:                 4

  Native vector width char:                      16

  Native vector width short:                     8

  Native vector width int:                       4

  Native vector width long:                      2

  Native vector width float:                     8

  Native vector width double:                    4

  Max clock frequency:                           2594Mhz

  Address bits:                                  64

  Max memory allocation:                         8571055104

  Image support:                                 Yes

  Max number of images read arguments:           128

  Max number of images write arguments:          8

  Max image 2D width:                            8192

  Max image 2D height:                           8192

  Max image 3D width:                            2048

  Max image 3D height:                           2048

  Max image 3D depth:                            2048

  Max samplers within kernel:                    16

  Max size of kernel argument:                   4096

  Alignment (bits) of base address:              1024

  Minimum alignment (bytes) for any datatype:    128

  Single precision floating point capability

    Denorms:                                     Yes

    Quiet NaNs:                                  Yes

    Round to nearest even:                       Yes

    Round to zero:                               Yes

    Round to +ve and infinity:                   Yes

    IEEE754-2008 fused multiply-add:             Yes

  Cache type:                                    Read/Write

  Cache line size:                               64

  Cache size:                                    32768

  Global memory size:                            34284220416

  Constant buffer size:                          65536

  Max number of constant args:                   8

  Local memory type:                             Global

  Local memory size:                             32768

  Kernel Preferred work group size multiple:     1

  Error correction support:                      0

  Unified memory for Host and Device:            1

  Profiling timer resolution:                    394

  Device endianess:                              Little

  Available:                                     Yes

  Compiler available:                            Yes

  Execution capabilities:

    Execute OpenCL kernels:                      Yes

    Execute native function:                     Yes

  Queue properties:

    Out-of-Order:                                No

    Profiling :                                  Yes

  Platform ID:                                   00007FF9D5D5BFB0

  Name:                                                Intel(R) Xeon(R) CPU E5-2

650 v2 @ 2.60GHz

  Vendor:                                        GenuineIntel

  Device OpenCL C version:                       OpenCL C 1.2

  Driver version:                                1573.4 (sse2,avx)

  Profile:                                       FULL_PROFILE

  Version:                                       OpenCL 1.2 AMD-APP (1573.4)

  Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_

global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3

2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_

khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store

cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec

3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sh

aring cl_khr_spir cl_khr_gl_event

0 Likes
jsnyder
Adept I

Re: Can't Debug Kernel in Teapot Example

I would really appreciate it if someone could help with this problem. I have upgraded to OpenXL 1.6, but I am still having the previously mentioned problem. I have tried different GPUs with the same result. I am unable to stop in a kernel using either the VisualStudio 2013 interface or the CodeXL console interface. When running the CodeXL console, I can start and run the Teapot example. When I put a breakpoint in applyBuoyancy(), the application stops. Here is the content of the "Debugged Process events" tab:

<<

Thread Created: 2848

DLL Loaded: C:\Windows\SysWOW64\ntdll.dll

DLL Loaded: C:\Windows\SysWOW64\kernel32.dll

DLL Loaded: C:\Windows\SysWOW64\KernelBase.dll

DLL Loaded: C:\Program Files (x86)\AMD\CodeXL\spies\opengl32.dll

DLL Loaded: C:\Windows\SysWOW64\user32.dll

DLL Loaded: C:\Windows\SysWOW64\gdi32.dll

DLL Loaded: C:\Windows\SysWOW64\shell32.dll

DLL Loaded: C:\Windows\SysWOW64\msvcr120.dll

DLL Loaded: C:\Windows\SysWOW64\ddraw.dll

DLL Loaded: C:\Program Files (x86)\AMD\CodeXL\AMDTServerUtilities.dll

DLL Loaded: C:\Program Files (x86)\AMD\CodeXL\AMDTBaseTools.dll

DLL Loaded: C:\Program Files (x86)\AMD\CodeXL\AMDTOSWrappers.dll

DLL Loaded: C:\Program Files (x86)\AMD\CodeXL\AMDTApiClasses.dll

DLL Loaded: C:\Windows\SysWOW64\ole32.dll

DLL Loaded: C:\Windows\SysWOW64\msvcp120.dll

DLL Loaded: C:\Windows\SysWOW64\msvcrt.dll

DLL Loaded: C:\Windows\SysWOW64\combase.dll

DLL Loaded: C:\Windows\SysWOW64\shlwapi.dll

DLL Loaded: C:\Windows\SysWOW64\dciman32.dll

DLL Loaded: C:\Windows\SysWOW64\ws2_32.dll

DLL Loaded: C:\Windows\SysWOW64\version.dll

DLL Loaded: C:\Windows\SysWOW64\dbghelp.dll

DLL Loaded: C:\Windows\SysWOW64\advapi32.dll

DLL Loaded: C:\Windows\SysWOW64\oleaut32.dll

DLL Loaded: C:\Windows\SysWOW64\propsys.dll

DLL Loaded: C:\Windows\SysWOW64\rpcrt4.dll

DLL Loaded: C:\Windows\SysWOW64\sechost.dll

DLL Loaded: C:\Windows\SysWOW64\nsi.dll

DLL Loaded: C:\Windows\SysWOW64\sspicli.dll

DLL Loaded: C:\Windows\SysWOW64\cryptbase.dll

DLL Loaded: C:\Windows\SysWOW64\bcryptprimitives.dll

DLL Loaded: C:\Windows\SysWOW64\imm32.dll

DLL Loaded: C:\Windows\SysWOW64\msctf.dll

API Connection Established: CodeXL Servers Manager

Thread Created: 5504

Process Run Started

Thread Created: 5556

Thread Terminated: 5556

DLL Loaded: C:\Windows\SysWOW64\uxtheme.dll

DLL Loaded: C:\Windows\SysWOW64\dwmapi.dll

DLL Loaded: C:\Windows\SysWOW64\kernel.appcore.dll

DLL Loaded: C:\Windows\SysWOW64\SHCore.dll

DLL Loaded: C:\Windows\SysWOW64\opengl32.dll

DLL Loaded: C:\Windows\SysWOW64\glu32.dll

API Connection Established: CodeXL OpenGL Server

Debug String: CodeXL OpenGL Server was initialized

DLL Loaded: C:\Windows\SysWOW64\comctl32.dll

DLL Loaded: C:\Windows\SysWOW64\atiglpxx.dll

DLL Loaded: C:\Windows\SysWOW64\atioglxx.dll

DLL Loaded: C:\Windows\SysWOW64\setupapi.dll

DLL Loaded: C:\Windows\SysWOW64\cfgmgr32.dll

DLL Loaded: C:\Windows\SysWOW64\winmm.dll

DLL Loaded: C:\Windows\SysWOW64\winmmbase.dll

DLL Loaded: C:\Windows\SysWOW64\devobj.dll

DLL Loaded: C:\Windows\SysWOW64\atiadlxy.dll

DLL Loaded: C:\Windows\SysWOW64\userenv.dll

DLL Loaded: C:\Windows\SysWOW64\wtsapi32.dll

DLL Loaded: C:\Windows\SysWOW64\psapi.dll

DLL Loaded: C:\Windows\SysWOW64\IPHLPAPI.DLL

DLL Loaded: C:\Windows\SysWOW64\profapi.dll

DLL Loaded: C:\Windows\SysWOW64\winnsi.dll

DLL Loaded: C:\Windows\SysWOW64\wintrust.dll

DLL Loaded: C:\Windows\SysWOW64\crypt32.dll

DLL Loaded: C:\Windows\SysWOW64\msasn1.dll

DLL Loaded: C:\Windows\SysWOW64\atigktxx.dll

DLL Unloaded: C:\Windows\SysWOW64\atigktxx.dll

DLL Loaded: C:\Windows\SysWOW64\atigktxx.dll

Thread Created: 3604

OpenGL Render Context 1 Created

DLL Loaded: C:\Windows\SysWOW64\OpenCL.dll

DLL Unloaded: C:\Windows\SysWOW64\OpenCL.dll

DLL Loaded: C:\Program Files (x86)\AMD\CodeXL\spies\OpenCL.dll

DLL Loaded: C:\Program Files (x86)\AMD\CodeXL\AMDTHsaDebugging.dll

DLL Loaded: C:\Program Files (x86)\AMD\CodeXL\spies\detoured.dll

DLL Loaded: C:\Program Files (x86)\AMD\CodeXL\AMDHwDbgFacilities.dll

DLL Loaded: C:\Windows\SysWOW64\OpenCL.dll

DLL Loaded: C:\Program Files (x86)\AMD\CodeXL\AMDOpenCLDebug.dll

DLL Loaded: C:\Windows\SysWOW64\aticalrt.dll

DLL Loaded: C:\Windows\SysWOW64\aticalcl.dll

DLL Loaded: C:\Windows\SysWOW64\aticaldd.dll

DLL Loaded: C:\Windows\SysWOW64\amdocl.dll

DLL Loaded: C:\Windows\SysWOW64\nvopencl.dll

DLL Loaded: C:\Windows\SysWOW64\nvapi.dll

DLL Loaded: C:\Windows\SysWOW64\atiumdva.dll

API Connection Established: CodeXL OpenCL Server

Debug String: CodeXL OpenCL Server was initialized

OpenCL Error on function: clGetGLContextInfoKHR

OpenCL Compute Context 1 Created

Thread Created: 3296

OpenCL Queue 1 (Context 1) Created

OpenCL Program 1 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 2 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 3 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 4 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 5 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 6 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 7 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 8 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 9 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 10 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 11 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 12 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 13 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 14 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 15 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 16 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

OpenCL Program 17 (Context 1) Created

Building OpenCL Program  (Context 1)...

Building OpenCL Program  (Context 1) Ended

            Build Log:

Debug String: CodeXL warning: The debugged process asked for an extension function pointer (glIsBuffer) from one render

context, but called this function pointer in another render context (context #1)

Debug String: CodeXL warning: The debugged process asked for an extension function pointer (glGetBufferParameteriv) from

one render context, but called this function pointer in another render context (context #1)

Thread Created: 5452

Thread Created: 256

Thread Terminated: 256

Thread Terminated: 5452

Step: clEnqueueReleaseGLObjects

>>

Here is the Call Stack:

wcsicmp - AMDTTeaPot.exe

sys_errlist - AMDTTeaPot.exe

wctype - AMDTTeaPot.exe

wctype - AMDTTeaPot.exe

wctype - AMDTTeaPot.exe

wctype - AMDTTeaPot.exe

wctype - USER32.dll

wctype - USER32.dll

wctype - USER32.dll

wctype - USER32.dll

wctype - AMDTTeaPot.exe

get_dstbias - AMDTTeaPot.exe

islower - KERNEL32.DLL

modf - ntdll.dll

Do any of you see anything in here which is indicative of a problem? I am wondering about the OpenGL messages myself.

Regards,

Jeff

0 Likes
urishomroni
Staff
Staff

Re: Can't Debug Kernel in Teapot Example

Hi Jeff,

1. You might want to upgrade your AMD driver to the new Catalyst Omega - this may help this issue.

2. According to your log snippets, the error is apparently from the OpenCL program not having debug information in a supported format. This could stem from various reasons - is it possible that your machine has some specific settings pertaining to OpenCL build flags (e.g. environment variables)?

3. Could it be that the kernels are being executed on a non-AMD device or an AMD CPU (both these scenarios are supposed to be handled with a specific message, but something might have gone wrong)? This can be verified by:

A. Having a look in the Teapot application's menu and seeing which device is selected for each type of calculation.

B. Setting a CodeXL breakpoint on clEnqueueNDRangeKernel, and checking the device list in the OpenCL context properties.

Regards,

0 Likes
jsnyder
Adept I

Re: Can't Debug Kernel in Teapot Example

I asked my coworker about their installation. They indicated that they did not have AMD APP SDK installed. I then uninstalled all AMD software (CodeXL 1.6, AMD APP SDK 2.9.1, Catalyst). I then installed Catalyst from "amd-catalyst-omega-14.12-with-dotnet45-win8.1-64bit.exe" and CodeXL from "AMD_CodeXL_Win_1.6.7249.exe". I did NOT reinstall AMD APP SDK.

I am now able to stop at breakpoints in kernels. CodeXL seems to be functional for me. I noticed that the date on "C:\Windows\SysWOW64\OpenCl.dll" and "C:\Windows\System32\OpenCl.dll" is now "11/20/2014 9:33 PM". Prior to my reinstall, the date was sometime in 9/2014.

I still don't know what was causing my problem. My best guess is that going from "OpenCL 1.2 AMD-APP (1573.4)" to "OpenCL 1.2 AMD-APP (1642.5)" was a help.

0 Likes
jsnyder
Adept I

Re: Can't Debug Kernel in Teapot Example

Thanks, Uri.

I did not see your message this morning until I had already replied to myself. I actually was able to get CodeXL running last Friday with the reinstall described.

Your message does spark further questions, though. When I go through the CodeXL menu in VisualStudio and set a breakpoint in clEnqueueNDRangeKernel, no source is visible after breaking. I don't know how I am supposed to look at the OpenCL context properties. There is nothing on the CodeXL menu that would allow me to do this.

When I set the breakpoint through the Visual Studio Debug menu, I am not able to see context properties until I move up several levels in the call stack:

     OpenCL.dll!50501d40()    Unknown

     [Frames below may be incorrect and/or missing, no symbols loaded for OpenCL.dll]   

     AMDTTeaPot.exe!AMDTTeapotOCLSmokeSystem::compute(const AMDTTeapotRenderState & state, const Mat4 & modelTransformation, float deltaTimeSeconds, float deltaRot) Line 2121    C++

>    AMDTTeaPot.exe!AMDTTeapotOCLSmokeSystem::draw(const AMDTTeapotRenderState & state, const Mat4 & modelTransformation, float deltaRot) Line 1664    C++

     AMDTTeaPot.exe!AMDTTeapotOGLCanvas::paintWindow() Line 413    C++

     AMDTTeaPot.exe!AMDTTeapotOGLCanvas::onPaint() Line 154    C++

     AMDTTeaPot.exe!MainWin::onPaint() Line 138    C++

     AMDTTeaPot.exe!WndProc(HWND__ * hWnd, unsigned int message, unsigned int wParam, long lParam) Line 444    C++

     [External Code]

From the calling context of the draw() method (in this particular break), I am able to examine the global _programs variable to get the list of devices. There is only 1. It is "Tahiti". I am not able to examine _programs from any method above draw() in the call stack.

Since both 3A and 3B indicate "Tahiti" (and did previously before my complete reinstall), I don't think that device selection was my problem.

Regards,

Jeff

0 Likes