cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

MatAle
Journeyman III

NBody GPU problem

 

Hello.

I try to adapt source code of the NBody problem to fit my needs (forces, energy and distances between atoms). X, Y, and Z are coordinates, C is charge, R (also dx, dy and dz) is for result and "limit" is number of atoms. Coordinates and results are stored in 1D tables.

The following kernel code works correct on CPU (it uses all 8 cores), but it somehow don't work correct on GPU. Results are large numbers or "-nan". 

For example:

GPU (incorrect):

dx=-17935.105469, dy=4237.721680, dz=-23556.548828

dx=-nan, dy=-nan, dz=-nan

CPU (correct):

dx=1.314520, dy=-4.063989, dz=1.077236

dx=-1.314520, dy=4.063989, dz=-1.077236

 

The only difference between CPU and GPU code is that I change "CL_DEVICE_TYPE_CPU" on device to "CL_DEVICE_TYPE_GPU" via method "clGetDeviceIDs".

Does anyone have any idea why this code doesn't work on GPU? I have a Radeon HD6850 1GB GPU, Intel i7 2600K and Gentoo Linux.

Here is my kernel code:

 

__kernel void vector_add(__global float *X, __global float *Y, __global float *Z, __global float *C, __global float *R , const int limit, __global float *dx , __global float *dy , __global float *dz) { int i = get_global_id(0); int n; float distance; float e, tx, ty, tz; float stx=0, sty=0, stz=0; float cutoff = 2.0; float cutx, f, cg, tf, grms=0.0; cutx = cutoff*cutoff; for(n=i+1; n<limit; n++){ distance=(pow(X-X,2)+pow(Y-Y,2)+pow(Z-Z,2)); if (distance <= cutoff){ f=(1.0-distance)/cutx; cg=(C*C)/sqrt(distance); e= cg*f*f; R=distance; tf = -e/(distance-4.0*cg*f/cutx); tx = tf*(X-X); ty = tf*(Y-Y); tz = tf*(Z-Z); dx = dx - tx; dy = dy - ty; dz = dz - tz; stx = stx + tx; sty = sty + ty; stz = stz + tz; } dx = dx + stx; dy = dy + sty; dz = dz + stz; grms = grms + pow(dx,2)+pow(dy,2)+pow(dz,2); } }

0 Likes
3 Replies
Bdot
Adept III

The numbers seem uninitialized. Did you define the memory buffers  (in/out/in-out) according to their use? Do you copy all buffers you need to the GPU and back?

0 Likes

 

Thank you for your reply. 

I already found a solution. 1D "dx", "dy" and "dz" tables have not been initialized correctly. Now the code works on GPU and CPU. But I have new problem because the GPU is not as fast as CPU (or even slower). I think that there is nothing wrong with transfering tables on GPU and back, but in the "distribution" of threads. Currently I am a bit lost as how to optimize the software code that would operate on the GPU faster.  

Does Anyone have an idea?

0 Likes

Edit.

0 Likes