20 Replies Latest reply on Nov 3, 2009 4:27 AM by david_aiken

    debugging

    david_aiken

      There is a recent interview with some of the AMD devs (http://forums.amd.com/devblog/blogpost.cfm?catid=335&threadid=120276) which includes the comment "...the OpenCL CPU implementation levertages the CPU hardware debug features to provide excellent debug capabilities, using familiar debug environments, at full CPU speeds.".

      I've probably missed it, but is there any debug support for Visual Studio 2008 on Vista planned for kernels running on the CPU, or perhaps within a GPU emulator? It would be great to catch kernel memory and build issues in Visual Studio.

        • debugging
          jmundy

          I second this query. Even without Visual Studio integration, is there a way to view kernel compiler error messages? Now there is just a numeric code returned that the program build failed when clBuildProgram is executed.

            • debugging
              omkaranathan

               

              Originally posted by: jmundyis there a way to view kernel compiler error messages? Now there is just a numeric code returned that the program build failed when clBuildProgram is executed.


              You can get the build log using clGetProgramBuildInfo() API call.

                • debugging
                  david_aiken

                  Yes.. it's pretty close, but you get references like

                  C:\Users\daiken\AppData\Local\Temp\OCL454.tmp.cl(54): warning: variable "lsb" is used before its value is set 

                  If you double-click on them in the output window they will navigate to the appropriate line in the editor.. or they would if the temporary file still existed. Really what you want, though, is the path to the original .cl file. It's possible to sweep through the output with a regex, replacing the file paths, but a simple fix to the OpenCL implementation would make it much easier.

                  This isn't a big issue for me currently. Catching subtle memory overwrites is. I'm working with a radix sort pulled from the NVidia SDK (it uses the recent paper from Satish et al) and it crashes in clFinish(). I suspect it's due to a memory error, but the code is quite low-level so it's difficult to isolate. They are NVidia kernels so i'm waiting for permission to post it here. If there is some way to use the AMD source or an emulator with runtime error checking i'll do the work myself.

                    • debugging
                      genaganna

                       

                      Originally posted by: david_aiken Yes.. it's pretty close, but you get references like

                       

                      C:\Users\daiken\AppData\Local\Temp\OCL454.tmp.cl(54): warning: variable "lsb" is used before its value is set 

                       

                      If you double-click on them in the output window they will navigate to the appropriate line in the editor.. or they would if the temporary file still existed. Really what you want, though, is the path to the original .cl file. It's possible to sweep through the output with a regex, replacing the file paths, but a simple fix to the OpenCL implementation would make it much easier.

                      Presently, clCreateProgramWithSource is only supported. you can do what you are expecting from clCreateProgramWithBinary. This will be available in upcoming releases.

                       

                      This isn't a big issue for me currently. Catching subtle memory overwrites is. I'm working with a radix sort pulled from the NVidia SDK (it uses the recent paper from Satish et al) and it crashes in clFinish(). I suspect it's due to a memory error, but the code is quite low-level so it's difficult to isolate. They are NVidia kernels so i'm waiting for permission to post it here. If there is some way to use the AMD source or an emulator with runtime error checking i'll do the work myself.

                        is it crashing for both CPU and GPU?

                        • debugging
                          david_aiken

                          It crashes when running it against an Intel Core 2 Quad Q6600 and AMD Turion 64 X2. I don't have an AMD GPU yet, regrettably.

                            • debugging
                              genaganna

                               

                              Originally posted by: david_aiken It crashes when running it against an Intel Core 2 Quad Q6600 and AMD Turion 64 X2. I don't have an AMD GPU yet, regrettably.

                               

                               

                              What modifications you did while porting sample?

                              Post the code here once you get permission

                                • debugging
                                  david_aiken

                                  Taking the original RadixSort.cl from the NVidia SDK v.2.3, I did the following to get it working with AMD Stream v2.0-beta4:

                                  1) copied scan.cl from oclScan NVidia example next to RadixSort.cl. The code also has to be changed to refer to this file rather than the missing "scan_b.cl".

                                  2) create separate builds for AMD and NVidia.

                                  3) modify the code and project settings to work with the AMD environment. Some of the convenience routines and logging were changed and a memory monitor added. Also added check for CL_DEVICE_TYPE_CPU.

                                  4) copy the following AMD dlls into the AMD output directory:

                                  aticalcl.dll, aticalrt.dll (pulled from recent driver)

                                  OpenCL.dll (from AMD SDK)

                                  5) running results in errors in both scan.cl and radixsort.cl:

                                  <cl file> internal error: array_element_type: non-array type

                                     __local uint numtrue;

                                                 ^

                                  1 catastrophic error detected in the compilation of <cl file>

                                  Compilation aborted.

                                  This is resolved by passing "-DAMD_BUILD" to clBuildProgram for the AMD builds and conditionally removing the __local in both files.

                                  6) once the .cl files build without errors, running with AMD results in a crash on calling clFinish():

                                   

                                  > OCL46C9.tmp.dll!001e14d7()

                                    [Frames below may be incorrect and/or missing, no symbols loaded for OCL46C9.tmp.dll]

                                    OCL46C9.tmp.dll!001e166d()

                                    OpenCL.dll!1001612c()

                                  Running with NVidia in both debug and release builds results in a passed test.
                                  I don't see a way to attach binaries so i've put the project/source at http://rapidshare.com/files/299338017/oclRadixSort.zip.html.


                                    • debugging
                                      genaganna

                                      It is failed to allocation device memory for mBlockOffsets on GPU(line number 57, RadixSort.cpp).

                                      Try with following

                                         select small value for numElements.

                                         WORKGROUP_SIZE must be <= 256 for GPU.

                                       

                                      Yes, It is crashing for CPU at my end also.  algorithm is too complex.

                        • debugging
                          MicahVillmow
                          david_aiken,
                          The crash is most likely coming from a buffer overflow on the local/private/global memory. I don't have your code, but if you increase the amount of local/global/private memory, does the crash go away?

                          This is one problem with directly porting GPU code, overflow's are stopped by the hardware, this is not the case on the CPU.
                          • debugging
                            MicahVillmow
                            The memory size is the size of memory assigned to a specific cl_mem object.

                            Micah
                              • debugging
                                david_aiken

                                Well.. i reduced the numElements down to 16Kb and, as also reported by genaganna, still got a crash. I can play with different buffers, but i don't know if i'm addressing an underlying problem or just moving the symptoms around.

                              • debugging
                                MicahVillmow
                                david_aiken,
                                Try modifying the size of the local memory inside the kernel.

                                Micah
                                • debugging
                                  MicahVillmow
                                  I would need to see kernel source to know that.
                                    • debugging
                                      david_aiken

                                      You have it at the rapidshare link posted above. The kernel is almost identical to the NVidia kernel, but there was a complaint from the AMD compiler regarding one of the local variables. The issue didn't seem like it would cause a problem. 

                                      It's an implementation of Satish's recent paper and at time of publication was considered to be the fastest GPU sort. I need to extend it and add other operations and your CPU-based approach seems good, but source would allow us to take full advantage of the dev environment (and GPUs). It would be nice if OpenCL was Open Source .