Hello to everyone,
I am trying to figure out why my opencl program crash and it seem very strange I made these test
#define TTPROGRAMDATA | __global struct TProgramData |
...
// WORK OK
int Testparseg1(TTPROGRAMDATA *ProgramData,float sprec,vec3 pos)
{
int cnt=0;
int i,ni=ProgramData->NumIstructions;//,imin;
float v,vmin;
bool fnd=false;
float MaxDistVoxel=ProgramData->VoxelData.VoxelRadius*2.0f;
float vraddia=ProgramData->VoxelData.VoxelRadius;
i=0;
while (i<ni&&(!fnd))
{
v=vTooltball( ProgramData,sprec,pos,&ProgramData->g1 );
if (v<=vraddia)
fnd=true;
i++;
}
if (fnd)
{
vraddia*=3.0f;
for (i=0;i<ni;i++)
{
v=vTooltball( ProgramData,sprec,pos,&ProgramData->g1 ); //ONLY 1 CALL
if (v<=vraddia)
cnt++;
}
}
return cnt;
}
// KO
int Testparseg1(TTPROGRAMDATA *ProgramData,float sprec,vec3 pos)
{
int cnt=0;
int i,ni=ProgramData->NumIstructions;//,imin;
float v,vmin;
bool fnd=false;
float MaxDistVoxel=ProgramData->VoxelData.VoxelRadius*2.0f;
float vraddia=ProgramData->VoxelData.VoxelRadius;
i=0;
while (i<ni&&(!fnd))
{
v=vTooltball( ProgramData,sprec,pos,&ProgramData->g1 );
if (v<=vraddia)
fnd=true;
i++;
}
if (fnd)
{
vraddia*=3.0f;
// ----different rows---
i=0;
v=vTooltball( ProgramData,sprec,pos,&ProgramData->g1 );
if (v<=vraddia)
cnt++;
//-----------------------------
for (i=1;i<ni;i++)
{
v=vTooltball( ProgramData,sprec,pos,&ProgramData->g1 );
if (v<=vraddia)
cnt++;
}
}
return cnt;
}
...
__kernel void Build_testg1(__constant struct TInputData *InputData,TTPROGRAMDATA *ProgramData,
TVOXELSPTR Voxels)
{
int iVoxel=get_global_id(0);
#ifdef _DEBUG
if (iVoxel>=MAXNUMVOXELS)
return ;
#endif
float3 vp=GetVoxelPnt(ProgramData,iVoxel);
Voxels[iVoxel].ListPtr=Testparseg1(ProgramData,InputData->ExtSPrec,vp);
}
The function Testparseg1 work and the only difference from the second one is an external call before the for cycle.
Both the function works when ProgramData->NumIstructions < 2000 , when the for have less than about 2000 cycles.
Suggestion ?
Thank you,
Denis.
Messaggio modificato da Denis A.
In attach there is an example of crashing driver program .
The sample is build using Visual Studio 2010 .
Into the folder bin there is a precompiled version with all the files required by the sample.
To compile the project you need boost 1.55 and glm 0.9.4.6 .
I Hope this can help to discover where the problem is .
Thank you ,
Denis.
Solved! Go to Solution.
I found the solution ( different from restructuring all to removing one nested cycle level )
Hi denis,
It is difficult to make out where the error could be just from the information you have posted. I've few queries/suggestions:
1. Does it generate any message during crashing? If so, what is the message?
2. Have you tried to compile the code without optimization (i.e. with flag "-cl-opt-disable") and got the same error? If not, please check. I just want to make sure whether its an optimization related issue or anything else.
3. Please make sure you are using latest driver and APP SDK. If possible, just reinstall them and check whether it still occurs or not.
It would be great help if you can provide us a sample project (host+gpu) such that we can reproduce the error? Also, please let us know your system setup details: CPU, GPU, SDK, Driver, OS (Window/Linux) (32/64) etc.?
Thanks,
Thank you Dipak for your answer.
1) The only message come from windows when it reload the grapphics driver after the crash.
2) No . I will try . Thank you for the suggestion.
3) I will try to update also the driver .
The project now is not so small to give you a sample but if I will not find a solution for sure I will make a small program as test.
Thnaks,
Denis.
After the test :
before updating the driver , using the release 13 building the program with flag "-cl-opt-disable" the compilation fail .
After the driver update to release 14 the compilation is faster but using the flag "-cl-opt-disable" the program crash.
With the new driver the program always crash in all the test I made also with ProgramData->NumIstructions < 100 .
It seem like the program access out of range memory ...
I will make more test.
Did you run any further tests? Please share your observations and a simple reproducible test case so that we can investigate the problem.
Hi Prasad Hariharan,
I run hundreds of test , what I get is the test fails when I have the most external cycle with more than about 1000 elements . I am not accessing out-of-memory but the driver crash . I can not compile without optimization because the compiler crash . I tried with a simple examples but with simple code everything work fine. The codexl is not able to compile the program ( the compilation phase never end ) . The intel compiler work fine . I don't know which is the best way to proceed .
One characteristic of the code is the presence of nested loops and to compile the code in the past I removed the last external loop otherwise amd compiler crash and also nvidia compiler was not able to build the code .
I am trying to make a test restructuring the code removing another one external loop .
The code work fine with a cycle of about 100 elements and driver 13 .
I will submit part of the code where I get the problem .
I can also give you the entire project but I can not make it publicly available .
Thank you for your interest.
This is the code of the problem
//#define PROGRAMDATA_READONLYMEMORY
#define _DEBUG
//#define LOWPROFILE
#define MAXNUMVOXELS 16000000
//#define VOXELSSIZE 8000000
#define MAXNUMVOXELSLIST 80000000
#define VOXELSLISTSIZE 320000000
#define MAXNUMPIXEL 2000000
#define FLAGS_AO 0x00000001
#define FLAGS_SHADOW 0x00000002
#define vec2 float2
#define vec3 float3
#define vec4 float4
#define mat2 float2x2
#define mat4 float4x4
#define TVOXELSPTR __global struct TVoxel *
#define TCOLOR float3
#define FresnelF 0.2f
#define SpecularF 16
#ifdef PROGRAMDATA_READONLYMEMORY
#define TTG1 __constant struct TG1_
#define TToolDef __constant struct ToolDef
#define TTPROGRAMDATA __constant struct TProgramData
#else
#define TTG1 __global struct TG1_
#define TToolDef __global struct ToolDef
#define TTPROGRAMDATA __global struct TProgramData
#endif
#define MAXTOOLS 10
#define MAXISTRUCTIONS 100000
//#define GBUFFERTEST if ((InputData->Moved==0)&&GBuffer[get_global_id(0)].ExLen>=1.0f&&(InputData->ShowTool<1||InputData->CurG>=InputData->NumIstructions)) return ;
#define GBUFFERTEST
#define GETGBUFFERPTR &(GBuffer[get_global_id(0)])
#define GBUFFERNULLTEST (GBuffer!=0)
struct __attribute__ ((packed)) TVoxel
{
int ListPtr;
float dist;
} ; //8
struct __attribute__ ((packed)) TGCell
{
int GIndex;
float ExLen;
} ;
struct __attribute__ ((packed)) ToolPoint
{
float3 pos;
float2 cb;
} ;
struct __attribute__ ((packed)) ToolDef
{
int tooltype;
float len;
float radius;
} ;
#define istrType_extrude2x 0
#define istrType_extrude5x 1
#define istrType_revolve2x 2
#define istrType_revolve2xi 3
struct __attribute__ ((packed)) TG1_
{
int tool;
int istrType;
float StartTime;
struct __attribute__ ((packed)) ToolPoint p1;
float EndTime;
struct __attribute__ ((packed)) ToolPoint p2;
float len1;
int filler;
} ;
struct __attribute__ ((packed)) TVoxelData
{
float VoxelSize;
float VoxelRadius;
int NumVoxelX;
int NumVoxelY;
int NumVoxelZ;
};
struct __attribute__ ((packed)) TInputData
{
float2 iResolution;
//uint CurSizeX;
//uint CurSizeY;
int4 iMouse;
int4 iDate;
float3 iRayOrigin;
float3 iRayDir;
float3 cu;
float3 cv;
float3 cw;
int Moved;
int ShowTool;
int CurG;
int Flags;
float ifView;
float ExtSPrec;
float RayCastMaxDist;
float RayCastPrec;
float3 SoftShadowLightDir;
float SoftShadowMint;
float SoftShadowMaxt;
float SoftShadowStep;
float SoftShadowK;
int AONumIterations;
float2 SubSamplingXY;
int SubSamplingIndex;
int SubSamplingSize;
float iGlobalTime;
};
struct __attribute__ ((packed)) TProgramData
{
struct TVoxelData VoxelData;
float3 PanelSize;
int NumIstructions;
struct ToolDef __attribute__ ((packed)) Tools[MAXTOOLS];
struct TG1_ __attribute__ ((packed)) g1[MAXISTRUCTIONS] ;
} ;
vec3 opMove(vec3 p,vec3 m)
{
return p-m;
}
vec3 opMoveY(vec3 p,float y)
{
p.y-=y;
return p;
//return (vec3)(p.x,p.y-y,p.z);
}
vec3 opRotateY(vec3 p,float a)
{
a=-a;
float3 v3;
float ca=cos(a);
float sa=sin(a);
v3.x=p.x*ca-p.z*sa;
v3.y=p.y;
v3.z=p.z*ca+p.x*sa;
return v3;
//return vec3(p.x*cos(a)-p.z*sin(a),p.y,p.z*cos(a)+p.x*sin(a));
}
vec3 opRotateZ(vec3 p,float a)
{
a=-a;
float3 v3;
float ca=cos(a);
float sa=sin(a);
v3.x=p.x*ca-p.y*sa;
v3.y=p.y*ca+p.x*sa;
v3.z=p.z;
return v3;
//return vec3(p.x*cos(a)-p.y*sin(a),p.y*cos(a)+p.x*sin(a),p.z);
}
float opExt1dir(float p, float x,float e2)
{
float r;
if (p<x)
r=p;
else
{
if (p>x+e2)
r=p-e2;
else
r=x;
}
return r;
}
vec3 opExtndir(vec3 p, vec3 x,vec3 e2)
{
vec3 r;
r.x=opExt1dir(p.x,x.x,e2.x);
r.y=opExt1dir(p.y,x.y,e2.y);
r.z=opExt1dir(p.z,x.z,e2.z);
return r;
}
float sdSphere( vec3 p, float s )
{
return length(p)-s;
}
float sdToolballg1_test(vec3 pos,float t,TTG1 *g1,TToolDef *tool)
{
float t1=1.f-t;
return sdSphere(
opExtndir(
opRotateZ(
opRotateY(
opMoveY(opMove(pos,(pos-0.1f)*t1+(pos+0.1f)*t),0.1f),
0.1f*t1+0.1f*t),
0.2f*t1+0.2f*t),
(vec3)(0.0f,0.f,0.f),(vec3)(0.f,0.3f,0.f)),
0.1f );
}
float sdTooltballe_test(TTPROGRAMDATA *InputData,float sprec,vec3 pos,float t,float s,TTG1 *g1)
{
float d1,d2,dmin;//,sprec;
//int i;
dmin=sdToolballg1_test(pos,t,0,0);
//min=dmin+1.f;
while (s>sprec)
{
d2=sdToolballg1_test(pos,t+s,0,0);
d1=sdToolballg1_test(pos,t-s,0,0);
if (d1<d2)
{
if (d1<dmin)
{
dmin=d1;
t-=s;
}
}
else
{
if (d2<dmin)
{
dmin=d2;
t+=s;
}
}
s*=0.5f;
}
return dmin;
}
float vTooltball_test(TTPROGRAMDATA *ProgramData,float sprec,vec3 pos,TTG1 *g1)
{
float _fv=1.0f;
#ifdef _DEBUG
if (!g1)
return 0;
if (g1->tool<0)
return 0;
#endif
return sdTooltballe_test(ProgramData,sprec,pos,_fv*0.5f,_fv*0.25f,g1);
}
int Test2parseg1(TTPROGRAMDATA *ProgramData,float sprec,vec3 pos,TTG1 *pg1)
{
int i=0,cnt=0;
float v=0.0;
float MaxDistVoxel=ProgramData->VoxelData.VoxelRadius*2.0f;
float vraddia=ProgramData->VoxelData.VoxelRadius;
//using only this code it works
vraddia*=3.0f;
for (i=0;i<ProgramData->NumIstructions;i++)
{
v+=vTooltball_test( ProgramData,sprec,pos,&pg1 );
vraddia+=v;
if (v>-1000000||vraddia>0)
cnt++;
}
/*
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
----WITH THIS CODE THE DRIVER CRASH------------------------------
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
for (i=1;i<ProgramData->NumIstructions;i++)
{
v+=vTooltball_test( ProgramData,sprec,pos,&pg1 );
vraddia+=v;
if (v>-1000000||vraddia>0)
cnt++;
}
---------------------------------------------------------------------------------------------
*/
return cnt;
}
__kernel void Build_testg2(__constant struct TInputData *InputData,TTPROGRAMDATA *ProgramData,
TVOXELSPTR Voxels,TTG1 *pg1)
{
int iVoxel=get_global_id(0);
#ifdef _DEBUG
if (iVoxel>=MAXNUMVOXELS)
return ;
#endif
float3 vp=GetVoxelPnt(ProgramData,iVoxel);
Voxels[iVoxel].ListPtr=Test2parseg1(ProgramData,InputData->ExtSPrec,vp,pg1);
}
I hope this can be enough to find a possible cause of the problem.
I am using a 7970 for the test.
The complete cl code is in the order of about 3000 lines of code is it too much for a cl program ?
Thank you,
Denis.
Hi Denis,
Thanks for capturing and posting the problematic code. We'll try to compile your code and let you know our observations. Meanwhile, please let us know your system setup details like CPU, GPU, SDK, Driver, OS (Window/Linux) (32/64) etc.
Regards,
Hi Dipak,
thank you for your support.
I am using an intel i7 930 processor with 6GB of ram , a Gigabyte HD 7970 GPU Windows 7 64 bit as operative system and beta amd driver 14.4 .
I made also another session of test and when I call from the host the kernel function Build_testg2 2 times the driver crash exactly as with the second crashing loop.
Another interesting test I made is replacing the loop into the routine sdTooltballe_test
float sdTooltballe_test(TTPROGRAMDATA *InputData,float sprec,vec3 pos,float t,float s,TTG1 *g1)
{
float d1,d2,dmin;//,sprec;
//int i;
dmin=sdToolballg1_test(pos,t,0,0);
return dmin;
}
using this new routine I made a cycle calling 100 times the kernel function Build_testg2 from the host and it work fine.
So in the next test a change the routine in
float sdTooltballe_test(TTPROGRAMDATA *InputData,float sprec,vec3 pos,float t,float s,TTG1 *g1)
{
float d1,d2,dmin;//,sprec;
int i;
dmin=sdToolballg1_test(pos,t,0,0);
//min=dmin+1.f;
for (i=0;i<3;i++)
{
d2=sdToolballg1_test(pos,t+s,0,0);
d1=sdToolballg1_test(pos,t-s,0,0);
if (d1<d2)
{
if (d1<dmin)
{
dmin=d1;
t-=s;
}
}
else
{
if (d2<dmin)
{
dmin=d2;
t+=s;
}
}
s*=0.5f;
}
return dmin;
}
using this routine calling the kernel the driver crash the I call the third time the function Build_testg2 .
My conclusion on the possible cause :
1) It can not be a cl compiler problem ( compiling Test2parseg1 ) because calling 2 times the kernel function make a crash anyway.
2) It can not be a problem computing the floating points due to the decreasing of "s" into the sdTooltballe_test because also using a cycle of 3 call from the host it crash.
3) It can not be a "fault" state that cause the program in the second call because it can work for 2 times and crash the third and source data is always the same.
I will make more test ...
Now I am using qt 5.2 as graphics interface , I will build a console project using the complete .cl file .
Thank you,
Denis.
Hi Denis,
I really appreciate your enthusiasm for trying out different experiments to narrow down the problem and possibly find the root area.
You've posted different versions of the code you've experimented with and still you're doing. I'm little bit confused as don't know what to start with. As you've mentioned that you want to do more experiments, I think it will be better for me to wait for some time to find out your observations. When you need our support, just let us know and provide a complete test project (host +GPU) that can be easily compiled and run.
Regards,
Dipak
Hi Dipak,
in the attach of the first post of the thread you can find a complete sample of the problem .
Tonight I made an interesting test replacing into the file toolsim.cl the function
float sdTooltballe_test(TTPROGRAMDATA *InputData,float sprec,vec3 pos,float t,float s,TTG1 *g1)
{
float d1,d2,dmin;
int i;
dmin=sdToolballg1_test(pos,t,0,0);
for (i=0;i<3;i++)
{
d2=sdToolballg1_test(pos,t+s,0,0);
d1=sdToolballg1_test(pos,t-s,0,0);
if (d1<d2)
{
if (d1<dmin)
{
dmin=d1;
t-=s;
}
}
else
{
if (d2<dmin)
{
dmin=d2;
t+=s;
}
}
s*=0.5f;
}
return dmin;
}
with the same exploding the cycle
float sdTooltballe_test(TTPROGRAMDATA *InputData,float sprec,vec3 pos,float t,float s,TTG1 *g1)
{
float d1,d2,dmin;//,sprec;
int i;
dmin=sdToolballg1_test(pos,t,0,0);
d2=sdToolballg1_test(pos,t+s,0,0);
d1=sdToolballg1_test(pos,t-s,0,0);
if (d1<d2)
{
if (d1<dmin)
{
dmin=d1;
}
}
else
{
if (d2<dmin)
{
dmin=d2;
}
}
d2=sdToolballg1_test(pos,t+s,0,0);
d1=sdToolballg1_test(pos,t-s,0,0);
if (d1<d2)
{
if (d1<dmin)
{
dmin=d1;
}
}
else
{
if (d2<dmin)
{
dmin=d2;
}
}
d2=sdToolballg1_test(pos,t+s,0,0);
d1=sdToolballg1_test(pos,t-s,0,0);
if (d1<d2)
{
if (d1<dmin)
{
dmin=d1;
}
}
else
{
if (d2<dmin)
{
dmin=d2;
}
}
return dmin;
}
and it work .
Denis.
I found the solution ( different from restructuring all to removing one nested cycle level )