cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

foxx1337
Adept I

possible bug with int math on the GPU inside cpu loop

works with BRT_RUNTIME=cpu

Hello again,

I've implemented a naive 3D Game of Life simulation with Brook+. I've noticed for some 32x32x32 initial configurations that after 6 generations 1 or 2 elements are wrong.

After 5 generations the resulting configuration is correct. Also, if I start the simulation again, using the previous 5 generation output as input and running for 1 generation, I obtain a correct result, as opposed to running for 6 generations in a row from the start.

When running with BRT_RUNTIME=cpu, the simulation always renders right results. With other grid sizes (like 10x10x10) the Brook+ simulation eventually renders wrong results.

 

Possible fishy spots of my implementation:

- I'm using 3D streams of int
- I'm using a 3D input matrix of int for the main kernel
- on the CPU I have a for looping for each generation, which calls the kernel
- all math inside the kernel is with int and int3
- kernel is O(n^3)
- have a few if blocks inside the kernel

 

Shouldn't the results from BRT_RUNTIME=cpu be identical to the results with BRT_RUNTIME=cal ?

My implementation doesn't need any synchronized access to resources. In a for loop, it calls the kernel with input A and output B, then it calls the kernel again with input B and output A (maybe there's a race there since the GPU is still performing tasks from the first kernel call when the second occurs?).

 

 

2008 server x64
4850
catalyst 8.10
brook+ and cal 1.2.1 beta x64
visual studio 2008 sp1, release x64

0 Likes
2 Replies
Marix
Adept II

Funny, I started working on pretty much the same problem, just that I was using 1D streams for a start, pretty much coding some code that was already there for some other framework. I too had working code on the CPU, however for me attampting to run a second generation usually segfaults.

0 Likes

Really funny stuff is that a colleague of mine is doing the same thing on CUDA, similar algorithm and with similar results, completely independent from me (except for the initial problem to solve). His implementation works for a few generations, then elements start getting wrong values after a threshold. The cpu simulation of the code works fine in his case too .

0 Likes