cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ljbade
Journeyman III

Kernel does not run when outside profiler

I am trying to port sailfish (http://gitorious.org/sailfish) to Windows (http://gitorious.org/~ljbade/sailfish/ljbades-sailfish

But when I run the example scripts I get only black pixel output, and the simulation frame counter goes crazy fast.

It seems the kernel is not being run, despite the enqueued kerned and the enqueued read completing without generating any errors.

But if I run the script via the profiler (clprofile.exe), it works fine!

I am running this on the CPU (Phenom II 720 BE).

Is there any sort of debug log I can look at?

Any ideas what the issue might be?

0 Likes
6 Replies
genaganna
Journeyman III

Originally posted by: ljbade I am trying to port sailfish (http://gitorious.org/sailfish) to Windows (http://gitorious.org/~ljbade/sailfish/ljbades-sailfish

 

But when I run the example scripts I get only black pixel output, and the simulation frame counter goes crazy fast.

 

It seems the kernel is not being run, despite the enqueued kerned and the enqueued read completing without generating any errors.

 

But if I run the script via the profiler (clprofile.exe), it works fine!

 

I am running this on the CPU (Phenom II 720 BE).

 

Is there any sort of debug log I can look at?

 

Any ideas what the issue might be?

 

Ljbade,

         clprofile is vaild only for GPU.  Are you sure that runs on CPU?

0 Likes

Yes it is running on the CPU (all 4 cores go to 90% load).

And considering I don't even have a OpenCL compatible GPU, it is impossible for it to use my GPU.

I have also modified pyopencl to use the pfn_notfy on clCreateContext, and that never gets called either...

0 Likes

Originally posted by: ljbade Yes it is running on the CPU (all 4 cores go to 90% load).

 

And considering I don't even have a OpenCL compatible GPU, it is impossible for it to use my GPU.

 

I have also modified pyopencl to use the pfn_notfy on clCreateContext, and that never gets called either...

 

It is very difficult for me to look .py code. It would be good if you give code in C/C++.

What do you mean by running script via clprofile.exe?  Handling error code returned by API is only the way to know what is going wrong.

0 Likes

By clprofile.exe, mean the one in C:\Program Files (x86)\ATI Stream\Tools\ATI Stream Profiler 1.0\x86

 

I simply run it like this from command prompt: clprofile.ex python.exe C:\sailfish\examples\lbm_ldc.py

That works fine (but slower due to profile overhead), but if I run

python.exe C:\sailfish\examples\lbm_ldc.py

I get black pixels and the kernel never runs (no load on CPU)

The code\command doesn't change, only the clprofile hosting enviroment. So there is something clprofile is doing that causes kernel to run...

The API never seems to return an error code either... thats why I was wondering if ATI Stream can generate a debug log.

I will try to create some pseudo-C code for you to look at.

0 Likes

OK here is the autogenerated opencl code.

This project appears to use a template generating system to create custom kernels on the fly depending on the dimensions etc. that can target both cuda and opencl with the same code.

 

#define BLOCK_SIZE 64 #define DIST_SIZE 9216 #define GEO_FLUID 0 #define GEO_BCV 2 #define GEO_BCP 2 #define DT 1.0f __constant float tau = 0.503f; // relaxation time __constant float visc = 0.001f; // viscosity __constant float geo_params[1] = { 0}; // geometry parameters typedef struct Dist { float fC; float fE; float fN; float fW; float fS; float fNE; float fNW; float fSW; float fSE; } Dist; // // Copy the idx-th distribution from din into dout. // inline void getDist(Dist *dout, __global float *din, int idx) { dout->fC = din[idx + DIST_SIZE*0]; dout->fE = din[idx + DIST_SIZE*1]; dout->fN = din[idx + DIST_SIZE*2]; dout->fW = din[idx + DIST_SIZE*3]; dout->fS = din[idx + DIST_SIZE*4]; dout->fNE = din[idx + DIST_SIZE*5]; dout->fNW = din[idx + DIST_SIZE*6]; dout->fSW = din[idx + DIST_SIZE*7]; dout->fSE = din[idx + DIST_SIZE*8]; } inline bool isFluidNode(int type) { return type == 0; } inline bool isWallNode(int type) { return type == 1; } inline bool isFluidOrWallNode(int type) { return type <= 1; } // This assumes we're dealing with a wall node. inline bool isVelocityNode(int type) { return (type >= 2) && (type < GEO_BCP); } inline bool isVelocityOrPressureNode(int type) { return (type >= 2); } inline bool isPressureNode(int type) { return (type >= 2); } inline void decodeNodeType(int nodetype, int *orientation, int *type) { *orientation = nodetype & 7; *type = nodetype >> 3; } inline void bounce_back(Dist *fi) { float t; t = fi->fE; fi->fE = fi->fW; fi->fW = t; t = fi->fN; fi->fN = fi->fS; fi->fS = t; t = fi->fNE; fi->fNE = fi->fSW; fi->fSW = t; t = fi->fNW; fi->fNW = fi->fSE; fi->fSE = t; } inline void compute_macro_quant(Dist *fi, float *rho, float *v) { *rho = fi->fC + fi->fE + fi->fN + fi->fNE + fi->fNW + fi->fS + fi->fSE + fi->fSW + fi->fW; v[0] = (fi->fE + fi->fNE + fi->fSE - fi->fNW - fi->fSW - fi->fW)/(*rho) ; v[1] = (fi->fN + fi->fNE + fi->fNW - fi->fS - fi->fSE - fi->fSW)/(*rho) ; } // // Get macroscopic density rho and velocity v given a distribution fi, and // the node class node_type. // inline void getMacro(Dist *fi, int node_type, int orientation, float *rho, float *v) { if (isFluidOrWallNode(node_type) || orientation == 0) { compute_macro_quant(fi, rho, v); if (isWallNode(node_type)) { v[0] = 0.0f; v[1] = 0.0f; } } else if (isVelocityNode(node_type)) { compute_macro_quant(fi, rho, v); } else if (isPressureNode(node_type)) { compute_macro_quant(fi, rho, v); } if (!isWallNode(node_type)) { v[0] += 0.00000018903591682420f; } } inline void boundaryConditions(Dist *fi, int node_type, int orientation, float *rho, float *v) { if (isWallNode(node_type)) { bounce_back(fi); } } // // A kernel to update the position of tracer particles. // // Each thread updates the position of a single particle using Euler's algorithm. // __kernel void LBMUpdateTracerParticles(__global float *dist, __global int *map, __global float *x, __global float *y ) { float rho, v[2]; int gi = get_global_id(0); float cx = x[gi]; float cy = y[gi]; int ix = (int)(cx); int iy = (int)(cy); // Sanity checks. if (iy < 0) iy = 0; if (ix < 0) ix = 0; if (ix > 191) ix = 191; if (iy > 47) iy = 47; int idx = ix + 192*iy; Dist fc; // getDist(&fc, dist, idx); fc.fC = dist[idx + DIST_SIZE*0]; fc.fE = dist[idx + DIST_SIZE*1]; fc.fN = dist[idx + DIST_SIZE*2]; fc.fW = dist[idx + DIST_SIZE*3]; fc.fS = dist[idx + DIST_SIZE*4]; fc.fNE = dist[idx + DIST_SIZE*5]; fc.fNW = dist[idx + DIST_SIZE*6]; fc.fSW = dist[idx + DIST_SIZE*7]; fc.fSE = dist[idx + DIST_SIZE*8]; int type, orientation; decodeNodeType(map[idx], &orientation, &type); getMacro(&fc, type, orientation, &rho, v); cx = cx + v[0] * DT; cy = cy + v[1] * DT; // Periodic boundary conditions. if (cx > 192) cx = 0.0f; if (cy > 48) cy = 0.0f; if (cx < 0.0f) cx = (float)192; if (cy < 0.0f) cy = (float)48; x[gi] = cx; y[gi] = cy; } // // Performs the relaxation step in the BGK model given the density rho, // the velocity v and the distribution fi. // void BGK_relaxate(float rho, float *v, Dist *fi, int node_type) { Dist feq; #define vx v[0] #define vy v[1] #define vz v[2] feq.fC = 4*rho/9 + 4*rho*(-3*vx*vx/2 - 3*vy*vy/2)/9 ; feq.fE = rho/9 + rho*(3*vx*(1 + vx) - 3*vy*vy/2)/9 ; feq.fN = rho/9 + rho*(3*vy*(1 + vy) - 3*vx*vx/2)/9 ; feq.fW = rho/9 + rho*(-3*vx*(1 - vx) - 3*vy*vy/2)/9 ; feq.fS = rho/9 + rho*(-3*vy*(1 - vy) - 3*vx*vx/2)/9 ; feq.fNE = rho/36 + rho*(3*vx*(1 + vx) + 3*vy*(1 + vy + 3*vx))/36 ; feq.fNW = rho/36 + rho*(-3*vx*(1 - vx) + 3*vy*(1 + vy - 3*vx))/36 ; feq.fSW = rho/36 + rho*(-3*vx*(1 - vx) - 3*vy*(1 - vy - 3*vx))/36 ; feq.fSE = rho/36 + rho*(-3*vy*(1 - vy + 3*vx) + 3*vx*(1 + vx))/36 ; fi->fC += (feq.fC - fi->fC) / tau; fi->fE += (feq.fE - fi->fE) / tau; fi->fN += (feq.fN - fi->fN) / tau; fi->fW += (feq.fW - fi->fW) / tau; fi->fS += (feq.fS - fi->fS) / tau; fi->fNE += (feq.fNE - fi->fNE) / tau; fi->fNW += (feq.fNW - fi->fNW) / tau; fi->fSW += (feq.fSW - fi->fSW) / tau; fi->fSE += (feq.fSE - fi->fSE) / tau; if (!isWallNode(node_type)) { // External acceleration. #define eax 0.00000037807183364839f #define eay 0.00000000000000000000f #define eaz 0.00000000000000000000f float pref = rho * (3.0f - 3.0f/(2.0f * tau)); fi->fC += 4*pref*(-eax*vx - eay*vy)/9 ; fi->fE += pref*(eax*(1 + 2*vx) - eay*vy)/9 ; fi->fN += pref*(eay - eax*vx + 2*eay*vy)/9 ; fi->fW += pref*(-eax - eay*vy + 2*eax*vx)/9 ; fi->fS += pref*(-eay - eax*vx + 2*eay*vy)/9 ; fi->fNE += pref*(eax*(1 + 2*vx + 3*vy) + eay*(1 + 2*vy + 3*vx))/36 ; fi->fNW += pref*(eay*(1 - 3*vx + 2*vy) - eax*(1 - 2*vx + 3*vy))/36 ; fi->fSW += pref*(-eax - eay + vx*(2*eax + 3*eay) + vy*(2*eay + 3*eax))/36 ; fi->fSE += pref*(eax*(1 - 3*vy + 2*vx) - eay*(1 - 2*vy + 3*vx))/36 ; } #undef vx #undef vy #undef vz } __kernel void LBMCollideAndPropagate(__global int *map, __global float *dist_in, __global float *dist_out, __global float *orho, __global float *ovx, __global float *ovy, int save_macro) { int lx = get_local_id(0); // ID inside the current block int gx = get_global_id(0); int gy = get_group_id(1); int gi = gx + 192*gy; // shared variables for in-block propagation __local float prop_fE[BLOCK_SIZE]; __local float prop_fNE[BLOCK_SIZE]; __local float prop_fSE[BLOCK_SIZE]; __local float prop_fW[BLOCK_SIZE]; __local float prop_fNW[BLOCK_SIZE]; __local float prop_fSW[BLOCK_SIZE]; // cache the distributions in local variables Dist fi; getDist(&fi, dist_in, gi); int type, orientation; decodeNodeType(map[gi], &orientation, &type); // macroscopic quantities for the current cell float rho, v[2]; getMacro(&fi, type, orientation, &rho, v); boundaryConditions(&fi, type, orientation, &rho, v); barrier(CLK_LOCAL_MEM_FENCE); // only save the macroscopic quantities if requested to do so if (save_macro == 1) { orho[gi] = rho; ovx[gi] = v[0]; ovy[gi] = v[1]; } if (isFluidNode(type)) { BGK_relaxate(rho, v, &fi, type); } // update the 0-th direction distribution dist_out[gi] = fi.fC; // E propagation in shared memory if (lx < 63) { prop_fE[lx+1] = fi.fE; prop_fNE[lx+1] = fi.fNE; prop_fSE[lx+1] = fi.fSE; // E propagation in global memory (at right block boundary) } else if (gx < 191) { dist_out[gi + 9217] = fi.fE; if (gy < 47) { dist_out[gi + 46273] = fi.fNE; } if (gy > 0) { dist_out[gi + 73537] = fi.fSE; } } // periodic boundary conditions in the X direction else { dist_out[gi + 9025] = fi.fE; if (gy < 47) { dist_out[gi + 46081] = fi.fNE; } if (gy > 0) { dist_out[gi + 73345] = fi.fSE; } } // W propagation in shared memory if (lx > 0) { prop_fW[lx-1] = fi.fW; prop_fNW[lx-1] = fi.fNW; prop_fSW[lx-1] = fi.fSW; // W propagation in global memory (at left block boundary) } else if (gx > 0) { dist_out[gi + 27647] = fi.fW; if (gy < 47) { dist_out[gi + 55487] = fi.fNW; } if (gy > 0) { dist_out[gi + 64319] = fi.fSW; } } // periodic boundary conditions in the X direction else { dist_out[gi + 27839] = fi.fW; if (gy < 47) { dist_out[gi + 55679] = fi.fNW; } if (gy > 0) { dist_out[gi + 64511] = fi.fSW; } } barrier(CLK_LOCAL_MEM_FENCE); // Save locally propagated distributions into global memory. // The leftmost thread is not updated in this block. if (lx > 0) { dist_out[gi + 9216] = prop_fE[lx]; if (gy < 47) { dist_out[gi + 46272] = prop_fNE[lx]; } if (gy > 0) { dist_out[gi + 73536] = prop_fSE[lx]; } } // Propagation in directions orthogonal to the X axis (global memory) if (gy < 47) { dist_out[gi + 18624] = fi.fN; } if (gy > 0) { dist_out[gi + 36672] = fi.fS; } // The rightmost thread is not updated in this block. if (lx < 63) { dist_out[gi + 27648] = prop_fW[lx]; if (gy < 47) { dist_out[gi + 55488] = prop_fNW[lx]; } if (gy > 0) { dist_out[gi + 64320] = prop_fSW[lx]; } } }

0 Likes

Originally posted by: ljbade OK here is the autogenerated opencl code.

 

This project appears to use a template generating system to create custom kernels on the fly depending on the dimensions etc. that can target both cuda and opencl with the same code.

 

 



Just giving kernel code would be help me.  I installed python on mysystem and executed lbm_ldc.py. 

I am getting following

 

C:\Documents and Settings\Administrator\Desktop\sailfish-ljbades-sailfish\examples>python.exe lbm_ldc.py
Traceback (most recent call last):
  File "lbm_ldc.py", line 3, in <module>
    import numpy
ImportError: No module named numpy

 

Could you please give me steps required to run your lbm_ldc.py samples?

0 Likes