I have a bug somewhere in my library. The library runs about 50 different kernels, the source for each kernel is customized in run-time before build.
These 50 kernels constitute a "batch". I run these batches 20,000 times. Let's call these 20,000 runs as the single cycle. The whole program execution consists of about 35 cycles.
The library is a convolutional neural network library. The whole program execution is ANN training (one of the online kind).
The program executes fine at GPU (Cayman). It shows expected results (and good performance, by the way).
When I start using CPU device (i2500) it shows weird results. In some cases I start seeing NaNs immidiately as the result of the 1st cycle. I some cases the 1st cycle is fine and I see NaNs in the results of the 2nd cycle. It is non-deterministic. I reboot the computer, run the program on the same data and see different result.
I am doing extensive code review right now. Still my guess is that I (well, the kernel code) somehow managed to out-of-boundary access buffers. I mean:
- I create buffer of size 40 (10 floats x 4 bytes per float) bytes in run-time environment.
- I setArg for some kernel to this buffer and enqueue the kernel for the execution.
- The kernel accesses buf (either read or write, doesn't matter)
And my guess is that the buffers might happen to be located at GPU the way that the bug doesn't appear (for example, reading 0.0F from out-of-range location might not ruin the result). But when run at CPU the kernels read/write to th? wrong location ruin the result.
Well, here is my quesion. Is there any way to have run-time checks for reads/writes inside kernels against actual sizes of buffers passed as parameters?
P.S. Catalyst 11.4, APP SDK 2.4, VS2010, R6950, i2500, Win7 Ultimate 64bit.