I've implemented a naive 3D Game of Life simulation with Brook+. I've noticed for some 32x32x32 initial configurations that after 6 generations 1 or 2 elements are wrong.
After 5 generations the resulting configuration is correct. Also, if I start the simulation again, using the previous 5 generation output as input and running for 1 generation, I obtain a correct result, as opposed to running for 6 generations in a row from the start.
When running with BRT_RUNTIME=cpu, the simulation always renders right results. With other grid sizes (like 10x10x10) the Brook+ simulation eventually renders wrong results.
Possible fishy spots of my implementation:
- I'm using 3D streams of int
- I'm using a 3D input matrix of int for the main kernel
- on the CPU I have a for looping for each generation, which calls the kernel
- all math inside the kernel is with int and int3
- kernel is O(n^3)
- have a few if blocks inside the kernel
Shouldn't the results from BRT_RUNTIME=cpu be identical to the results with BRT_RUNTIME=cal ?
My implementation doesn't need any synchronized access to resources. In a for loop, it calls the kernel with input A and output B, then it calls the kernel again with input B and output A (maybe there's a race there since the GPU is still performing tasks from the first kernel call when the second occurs?).
2008 server x64
brook+ and cal 1.2.1 beta x64
visual studio 2008 sp1, release x64