How about trying the tool called Kernel Analyzer?
The Kernel Analyzer more or less gives the same results. It either compiles fine, won't compile and gives no build information or crashes the kernel analyzer.
I've been trying to narrow down the problem with the KernelAnalyzer2 program and here is what I've found. My program is propagating some system through time and using a fourth order Runge-Kutta algorithm as part of the kernel. As such it has to call upon some equations multiple times to calculate velocities and whatnot, those equations here are D_Theta and D_Phi. When it calls them only the first time everything works out fine and it compiles (in the sample code below I have the other three calls commented out, this version compiles). When I try to have it call them the second, third and fourth times it won't compile the kernel and it gives no error.
void RK4(Parameters mp, float* DT, float j, float* theta, float* phi, float* time)
float th = *theta;
float ph = *phi;
float t = *time;
float dt = *DT;
float k11, k12, k21, k22, k31, k32, k41, k42;
k11 = k12 = k21 = k22 = k31 = k32 = k41 = k42 = 0;
float dtheta = 0;
float dphi = 0;
float tth = th;
float tph = ph;
k11 = D_Theta(mp, j, tth, tph);
k12 = D_Phi(mp, j, tth, tph);
tth = th + k11 * dt / 2;
tph = ph + k12 * dt / 2;
//k21 = D_Theta(mp, j, tth, tph);
//k22 = D_Phi(mp, j, tth, tph);
tth = th + k21 * dt / 2;
tph = ph + k22 * dt / 2;
//k31 = D_Theta(mp, j, tth, tph);
//k32 = D_Phi(mp, j, tth, tph);
tth = th + k31 * dt;
tph = ph + k32 * dt;
//k41 = D_Theta(mp, j, tth, tph);
//k42 = D_Phi(mp, j, tth, tph);
dtheta = (k11 + 2 * k21 + 2 * k31 + k41) * dt / 6;
dphi = (k12 + 2 * k22 + 2 * k32 + k42) * dt / 6;
th += dtheta;
ph += dphi;
*theta = th;
*phi = ph;
*time = t;
Hi, Is there some private array allocated in the D_Theta or D_Phi?
No both D_Theta and D_Phi are structured like
float d_theta = 0;
the "do something" does contain several if statements but otherwise it's just simple multiplications, divides, and trig functions.
Is it possible for you to upload the code so that we can reproduce the issue?
Otherwise, when it compiles, can it run on that GPU?
Here is the code. There is a lot to it but the clearest change that controls whether the kernel will compile or not are the four commented lines in the RK4 function as discussed earlier. When it does compile it runs just fine although to make it functional with those four lines active I have to eliminate other portions of the program. In essence all the parts of the program work just fine but when I try to put too many of them together it doesn't compile.
Oh also I'm building with build options -cl-single-precision-constant and -cl-no-signed-zeros (not sure if either of these are really needed) and I'm building on a Cypress card.
SwitchingSimple.cl.zip 2.8 KB
hmm, it seems to be a hardware issue. It can be successfully compiled on NV card, but get crash on ATI one here.