cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

david_aiken
Journeyman III

debugging

There is a recent interview with some of the AMD devs (http://forums.amd.com/devblog/blogpost.cfm?catid=335&threadid=120276) which includes the comment "...the OpenCL CPU implementation levertages the CPU hardware debug features to provide excellent debug capabilities, using familiar debug environments, at full CPU speeds.".

I've probably missed it, but is there any debug support for Visual Studio 2008 on Vista planned for kernels running on the CPU, or perhaps within a GPU emulator? It would be great to catch kernel memory and build issues in Visual Studio.

0 Likes
20 Replies
jmundy
Journeyman III

I second this query. Even without Visual Studio integration, is there a way to view kernel compiler error messages? Now there is just a numeric code returned that the program build failed when clBuildProgram is executed.

0 Likes

Originally posted by: jmundyis there a way to view kernel compiler error messages? Now there is just a numeric code returned that the program build failed when clBuildProgram is executed.


You can get the build log using clGetProgramBuildInfo() API call.

0 Likes

Yes.. it's pretty close, but you get references like

C:\Users\daiken\AppData\Local\Temp\OCL454.tmp.cl(54): warning: variable "lsb" is used before its value is set 

If you double-click on them in the output window they will navigate to the appropriate line in the editor.. or they would if the temporary file still existed. Really what you want, though, is the path to the original .cl file. It's possible to sweep through the output with a regex, replacing the file paths, but a simple fix to the OpenCL implementation would make it much easier.

This isn't a big issue for me currently. Catching subtle memory overwrites is. I'm working with a radix sort pulled from the NVidia SDK (it uses the recent paper from Satish et al) and it crashes in clFinish(). I suspect it's due to a memory error, but the code is quite low-level so it's difficult to isolate. They are NVidia kernels so i'm waiting for permission to post it here. If there is some way to use the AMD source or an emulator with runtime error checking i'll do the work myself.

0 Likes

Originally posted by: david_aiken Yes.. it's pretty close, but you get references like

 

C:\Users\daiken\AppData\Local\Temp\OCL454.tmp.cl(54): warning: variable "lsb" is used before its value is set 

 

If you double-click on them in the output window they will navigate to the appropriate line in the editor.. or they would if the temporary file still existed. Really what you want, though, is the path to the original .cl file. It's possible to sweep through the output with a regex, replacing the file paths, but a simple fix to the OpenCL implementation would make it much easier.

Presently, clCreateProgramWithSource is only supported. you can do what you are expecting from clCreateProgramWithBinary. This will be available in upcoming releases.

 

This isn't a big issue for me currently. Catching subtle memory overwrites is. I'm working with a radix sort pulled from the NVidia SDK (it uses the recent paper from Satish et al) and it crashes in clFinish(). I suspect it's due to a memory error, but the code is quite low-level so it's difficult to isolate. They are NVidia kernels so i'm waiting for permission to post it here. If there is some way to use the AMD source or an emulator with runtime error checking i'll do the work myself.

  is it crashing for both CPU and GPU?

0 Likes

It crashes when running it against an Intel Core 2 Quad Q6600 and AMD Turion 64 X2. I don't have an AMD GPU yet, regrettably.

0 Likes

Originally posted by: david_aiken It crashes when running it against an Intel Core 2 Quad Q6600 and AMD Turion 64 X2. I don't have an AMD GPU yet, regrettably.

 

 

What modifications you did while porting sample?

Post the code here once you get permission

0 Likes

Taking the original RadixSort.cl from the NVidia SDK v.2.3, I did the following to get it working with AMD Stream v2.0-beta4:

1) copied scan.cl from oclScan NVidia example next to RadixSort.cl. The code also has to be changed to refer to this file rather than the missing "scan_b.cl".

2) create separate builds for AMD and NVidia.

3) modify the code and project settings to work with the AMD environment. Some of the convenience routines and logging were changed and a memory monitor added. Also added check for CL_DEVICE_TYPE_CPU.

4) copy the following AMD dlls into the AMD output directory:

aticalcl.dll, aticalrt.dll (pulled from recent driver)

OpenCL.dll (from AMD SDK)

5) running results in errors in both scan.cl and radixsort.cl:

<cl file> internal error: array_element_type: non-array type

   __local uint numtrue;

               ^

1 catastrophic error detected in the compilation of <cl file>

Compilation aborted.

This is resolved by passing "-DAMD_BUILD" to clBuildProgram for the AMD builds and conditionally removing the __local in both files.

6) once the .cl files build without errors, running with AMD results in a crash on calling clFinish():

> OCL46C9.tmp.dll!001e14d7()

  [Frames below may be incorrect and/or missing, no symbols loaded for OCL46C9.tmp.dll]

  OCL46C9.tmp.dll!001e166d()

  OpenCL.dll!1001612c()

Running with NVidia in both debug and release builds results in a passed test.
I don't see a way to attach binaries so i've put the project/source at http://rapidshare.com/files/299338017/oclRadixSort.zip.html.


0 Likes

It is failed to allocation device memory for mBlockOffsets on GPU(line number 57, RadixSort.cpp).

Try with following

   select small value for numElements.

   WORKGROUP_SIZE must be <= 256 for GPU.

 

Yes, It is crashing for CPU at my end also.  algorithm is too complex.

0 Likes

Are you saying that it works for you on the GPU if you change these settings? If so, it would help if you could tell me which GPU you use and how many elements can you sort.

The algorithm is adapted from "

0 Likes

I tried with different values of numElements. It is crashing different places.

It takes lot of time to understand code. Hope we will reply back as early as possible.

0 Likes

Is it possible to get access to the AMD OpenCL CPU code under NDA? A call stack with source would really help to track down these mysterious crashes.

0 Likes

david_aiken,
The crash is most likely coming from a buffer overflow on the local/private/global memory. I don't have your code, but if you increase the amount of local/global/private memory, does the crash go away?

This is one problem with directly porting GPU code, overflow's are stopped by the hardware, this is not the case on the CPU.
0 Likes

Can you tell me where the process for setting the size of these pools is described?

0 Likes

The memory size is the size of memory assigned to a specific cl_mem object.

Micah
0 Likes

Well.. i reduced the numElements down to 16Kb and, as also reported by genaganna, still got a crash. I can play with different buffers, but i don't know if i'm addressing an underlying problem or just moving the symptoms around.

0 Likes

david_aiken,
Try modifying the size of the local memory inside the kernel.

Micah
0 Likes

Which variable in particular do you think would be best?

0 Likes

I would need to see kernel source to know that.
0 Likes

You have it at the rapidshare link posted above. The kernel is almost identical to the NVidia kernel, but there was a complaint from the AMD compiler regarding one of the local variables. The issue didn't seem like it would cause a problem. 

It's an implementation of Satish's recent paper and at time of publication was considered to be the fastest GPU sort. I need to extend it and add other operations and your CPU-based approach seems good, but source would allow us to take full advantage of the dev environment (and GPUs). It would be nice if OpenCL was Open Source .

0 Likes

The problem seems to be due to the local memory variable numtrue in rank4(). Removing the __local isn't a valid workaround because the variable must be updated by the group in order to calculate a valid rank. An invalid rank causes memory corruption in the calling function.

I tried the workaround suggested by mjharvey (http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=120374). It still crashed.

I also tried passing a __local into the kernel. This still crashes with my AMD environment.

The workaround approaches run ok in my NVidia environment. The project is at http://rapidshare.com/files/301718896/oclRadixSortSentToAMDForumWithLocalMemFix.zip.html



0 Likes