foxx1337

possible bug with int math on the GPU inside cpu loop

Discussion created by foxx1337 on Oct 25, 2008
Latest reply on Oct 27, 2008 by foxx1337
works with BRT_RUNTIME=cpu

Hello again,

I've implemented a naive 3D Game of Life simulation with Brook+. I've noticed for some 32x32x32 initial configurations that after 6 generations 1 or 2 elements are wrong.

After 5 generations the resulting configuration is correct. Also, if I start the simulation again, using the previous 5 generation output as input and running for 1 generation, I obtain a correct result, as opposed to running for 6 generations in a row from the start.

When running with BRT_RUNTIME=cpu, the simulation always renders right results. With other grid sizes (like 10x10x10) the Brook+ simulation eventually renders wrong results.

 

Possible fishy spots of my implementation:

- I'm using 3D streams of int
- I'm using a 3D input matrix of int for the main kernel
- on the CPU I have a for looping for each generation, which calls the kernel
- all math inside the kernel is with int and int3
- kernel is O(n^3)
- have a few if blocks inside the kernel

 

Shouldn't the results from BRT_RUNTIME=cpu be identical to the results with BRT_RUNTIME=cal ?

My implementation doesn't need any synchronized access to resources. In a for loop, it calls the kernel with input A and output B, then it calls the kernel again with input B and output A (maybe there's a race there since the GPU is still performing tasks from the first kernel call when the second occurs?).

 

 

2008 server x64
4850
catalyst 8.10
brook+ and cal 1.2.1 beta x64
visual studio 2008 sp1, release x64

Outcomes