possible bug with int math on the GPU inside cpu loop

Discussion created by foxx1337 on Oct 25, 2008
Latest reply on Oct 27, 2008 by foxx1337
works with BRT_RUNTIME=cpu

Hello again,

I've implemented a naive 3D Game of Life simulation with Brook+. I've noticed for some 32x32x32 initial configurations that after 6 generations 1 or 2 elements are wrong.

After 5 generations the resulting configuration is correct. Also, if I start the simulation again, using the previous 5 generation output as input and running for 1 generation, I obtain a correct result, as opposed to running for 6 generations in a row from the start.

When running with BRT_RUNTIME=cpu, the simulation always renders right results. With other grid sizes (like 10x10x10) the Brook+ simulation eventually renders wrong results.


Possible fishy spots of my implementation:

- I'm using 3D streams of int
- I'm using a 3D input matrix of int for the main kernel
- on the CPU I have a for looping for each generation, which calls the kernel
- all math inside the kernel is with int and int3
- kernel is O(n^3)
- have a few if blocks inside the kernel


Shouldn't the results from BRT_RUNTIME=cpu be identical to the results with BRT_RUNTIME=cal ?

My implementation doesn't need any synchronized access to resources. In a for loop, it calls the kernel with input A and output B, then it calls the kernel again with input B and output A (maybe there's a race there since the GPU is still performing tasks from the first kernel call when the second occurs?).



2008 server x64
catalyst 8.10
brook+ and cal 1.2.1 beta x64
visual studio 2008 sp1, release x64