I have a bug somewhere in my library. The library runs about 50 different kernels, the source for each kernel is customized in run-time before build.
These 50 kernels constitute a "batch". I run these batches 20,000 times. Let's call these 20,000 runs as the single cycle. The whole program execution consists of about 35 cycles.
The library is a convolutional neural network library. The whole program execution is ANN training (one of the online kind).
The program executes fine at GPU (Cayman). It shows expected results (and good performance, by the way).
When I start using CPU device (i2500) it shows weird results. In some cases I start seeing NaNs immidiately as the result of the 1st cycle. I some cases the 1st cycle is fine and I see NaNs in the results of the 2nd cycle. It is non-deterministic. I reboot the computer, run the program on the same data and see different result.
I am doing extensive code review right now. Still my guess is that I (well, the kernel code) somehow managed to out-of-boundary access buffers. I mean:
And my guess is that the buffers might happen to be located at GPU the way that the bug doesn't appear (for example, reading 0.0F from out-of-range location might not ruin the result). But when run at CPU the kernels read/write to th? wrong location ruin the result.
Well, here is my quesion. Is there any way to have run-time checks for reads/writes inside kernels against actual sizes of buffers passed as parameters?
P.S. Catalyst 11.4, APP SDK 2.4, VS2010, R6950, i2500, Win7 Ultimate 64bit.
Originally posted by: MicahVillmow Maximmoroz, If you are on linux, try running valgrind to do some basic runtime checks. Another project that was brought to my attention recently is here: http://code.google.com/p/addre...ressSanitizerAlgorithm
I am on Windows 7 Ultimate 64bit.
Originally posted by: nou
but on CPU you should get segmentation fault.
It would be great if I get this segmentation fault. But I don't. Maybe corrupted kernel accesses some other buffer (I have about 70 buffers)? Who knows.
as you have quite small buffers it is possible that all are allocated into few page so you dont get segfault as you still access your own memory.
one sugestion. create buffers with greater size than you should use and initialize it with zeroes. after kernel launch you can inspect if this spare buffers size is corrupted.
and it is of course possible that there is bug in compiler. i had similiar issue with my program. on CPU it runs fine but with GPU version it returns wrong results.
Actually the size of buffers allocated are up to several megabytes.
Your suggestion with manual out-of-boundary writes detection is a good one, thank you. It is a costy one and is limited to writes detection only (no out-of-boundary reads detecton), still I might finish up trying it.
No, it seems it is not a bug in the compiler: I tried running the program with alpha version of Intel OpenCL implementation yesterday and got similar weird results. It might be a bug in LLVM compiler (as both implementations are using it), but I doubt it.
I would suggest to use GDB to debug the application. See the iterator values at various positions(generally inside loops modifying the buffers). This help you ti figure out something.
Originally posted by: himanshu.gautam I would suggest to use GDB to debug the application. See the iterator values at various positions(generally inside loops modifying the buffers). This help you ti figure out something.
Did you try GDB in Windows (with cygwin or minGW)? Because I am developing in VS2010 and for Windows. And I find visual debugger integrated into VS2010 to be very convinient and helpful.
Oh, so this is my top priority (and actually the only one) issue with AMD APP SDK: No visual debugger for VS2010. Profiler is fine, but debugger would be really nice.
Himanshu, thans for the idea. May be I will be forced to use these console tools
well currently that is the only way possible to debug your opencl kernels that too only for CPU. You will get information on how to use it in chapter 3 of AMD APP OpenCL Programming Guide.
It might seem hard but GDB is a very useful and easy to learn tool.