cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

denis
Adept I

OpenCL program crash

Hello to everyone,

I am trying to figure out why my opencl program crash and it seem very strange I made these test

#define TTPROGRAMDATA  __global struct TProgramData

...

// WORK OK

int Testparseg1(TTPROGRAMDATA *ProgramData,float sprec,vec3 pos)

{

int cnt=0;

int i,ni=ProgramData->NumIstructions;//,imin;

float v,vmin;

bool fnd=false;

float MaxDistVoxel=ProgramData->VoxelData.VoxelRadius*2.0f;

float vraddia=ProgramData->VoxelData.VoxelRadius;

i=0;

while (i<ni&&(!fnd))

    {

    v=vTooltball( ProgramData,sprec,pos,&ProgramData->g1 );  

    if (v<=vraddia)

        fnd=true;

    i++;

    }

if (fnd)

    {

    vraddia*=3.0f;

    for (i=0;i<ni;i++)

        {

        v=vTooltball( ProgramData,sprec,pos,&ProgramData->g1 );    //ONLY 1 CALL

        if (v<=vraddia)

            cnt++;

        }

    }

return cnt;

}

// KO

int Testparseg1(TTPROGRAMDATA *ProgramData,float sprec,vec3 pos)

{

int cnt=0;

int i,ni=ProgramData->NumIstructions;//,imin;

float v,vmin;

bool fnd=false;

float MaxDistVoxel=ProgramData->VoxelData.VoxelRadius*2.0f;

float vraddia=ProgramData->VoxelData.VoxelRadius;

i=0;

while (i<ni&&(!fnd))

    {

    v=vTooltball( ProgramData,sprec,pos,&ProgramData->g1 );  

    if (v<=vraddia)

        fnd=true;

    i++;

    }

if (fnd)

    {

    vraddia*=3.0f;

     // ----different rows---

    i=0;

    v=vTooltball( ProgramData,sprec,pos,&ProgramData->g1 );  

    if (v<=vraddia)

        cnt++;

    //-----------------------------

    for (i=1;i<ni;i++)

        {

        v=vTooltball( ProgramData,sprec,pos,&ProgramData->g1 );  

        if (v<=vraddia)

            cnt++;

        }

    }

return cnt;

}

...

__kernel void Build_testg1(__constant struct TInputData *InputData,TTPROGRAMDATA *ProgramData,

                                TVOXELSPTR Voxels)

{

int    iVoxel=get_global_id(0);

#ifdef _DEBUG

if (iVoxel>=MAXNUMVOXELS)

    return ;

#endif

float3    vp=GetVoxelPnt(ProgramData,iVoxel);

Voxels[iVoxel].ListPtr=Testparseg1(ProgramData,InputData->ExtSPrec,vp);

}

The function Testparseg1 work and the only difference from the second one is an external call before the for cycle.

Both the function works when ProgramData->NumIstructions < 2000 , when the for have less than about 2000 cycles.

Suggestion ?

Thank you,

Denis.

Messaggio modificato da Denis A.

In attach there is an example of crashing driver program .
The sample is build using Visual Studio 2010 .
Into the folder bin there is a precompiled version with all the files required by the sample.
To compile the project you need boost 1.55 and glm 0.9.4.6 .
I Hope this can help to discover where the problem is .

Thank you ,
Denis.

0 Likes
1 Solution
denis
Adept I

I found the solution ( different from restructuring all to removing one nested cycle level )

View solution in original post

0 Likes
11 Replies
dipak
Big Boss

Hi denis,

It is difficult to make out where the error could be just from the information you have posted. I've few queries/suggestions:

1. Does it generate any message during crashing? If so, what is the message?

2. Have you tried to compile the code without optimization (i.e. with flag "-cl-opt-disable") and got the same error? If not, please check. I just want to make sure whether its an optimization related issue or anything else.

3. Please make sure you are using latest driver and APP SDK. If possible, just reinstall them and check whether it still occurs or not.

It would be great help if you can provide us a sample project (host+gpu) such that we can reproduce the error? Also, please let us know your system setup details: CPU, GPU, SDK, Driver, OS (Window/Linux) (32/64) etc.?

Thanks,

Thank you Dipak for your answer.

1) The only message come from windows when it reload the grapphics driver after the crash.

2) No . I will try . Thank you for the suggestion.

3) I will try to update also the driver .

The project now is not so small to give you a sample but if I will not find a solution for sure I will make a small program as test.

Thnaks,

Denis.

0 Likes

After the test :

before updating the driver , using the release 13 building the program with flag "-cl-opt-disable" the compilation fail .

After the driver update to release 14 the compilation is faster but using the flag "-cl-opt-disable" the program crash.

With the new driver the program always crash in all the test I made also with ProgramData->NumIstructions < 100 .

It seem like the program access out of range memory ...

I will make more test.

0 Likes

Did you run any further tests?  Please share your observations and a simple reproducible test case so that we can investigate the problem.

0 Likes

Hi Prasad Hariharan,


I run hundreds of test , what I get is the test fails when I have the most external cycle with more than about 1000 elements . I am not accessing out-of-memory but the driver crash . I can not compile without optimization because the compiler crash . I tried with a simple examples but with simple code everything work fine. The codexl is not able to compile the program ( the compilation phase never end ) . The intel compiler work fine . I don't know which is the best way to proceed .

One characteristic of the code is the presence of nested loops and to compile the code in the past I removed the last external loop otherwise amd compiler crash and also nvidia compiler was not able to build the code .

I am trying to make a test restructuring the code removing another one external loop .

The code work fine with a cycle of about 100 elements and driver 13 .

I will submit part of the code where I get the problem .

I can also give you the entire project but I can not make it publicly available .


Thank you for your interest.

0 Likes

This is the code of the problem

//#define PROGRAMDATA_READONLYMEMORY  

#define _DEBUG

//#define LOWPROFILE

#define MAXNUMVOXELS        16000000

//#define    VOXELSSIZE            8000000

#define MAXNUMVOXELSLIST    80000000

#define    VOXELSLISTSIZE        320000000

#define MAXNUMPIXEL        2000000

#define FLAGS_AO        0x00000001

#define FLAGS_SHADOW    0x00000002

#define vec2       float2

#define vec3       float3

#define vec4       float4

#define mat2       float2x2

#define mat4       float4x4

#define TVOXELSPTR        __global struct TVoxel *

#define TCOLOR        float3

#define FresnelF    0.2f

#define SpecularF    16

  

#ifdef PROGRAMDATA_READONLYMEMORY

    #define TTG1                __constant struct TG1_

    #define TToolDef            __constant struct ToolDef

    #define    TTPROGRAMDATA        __constant struct TProgramData

#else

    #define TTG1                __global struct TG1_

    #define TToolDef            __global struct ToolDef

    #define TTPROGRAMDATA        __global struct TProgramData

#endif

#define MAXTOOLS        10

#define MAXISTRUCTIONS    100000

//#define GBUFFERTEST        if ((InputData->Moved==0)&&GBuffer[get_global_id(0)].ExLen>=1.0f&&(InputData->ShowTool<1||InputData->CurG>=InputData->NumIstructions))    return ;

#define GBUFFERTEST

#define GETGBUFFERPTR    &(GBuffer[get_global_id(0)])

#define GBUFFERNULLTEST    (GBuffer!=0)

struct __attribute__ ((packed)) TVoxel

    {

    int        ListPtr;  

    float    dist;

    }    ;              //8

struct __attribute__ ((packed)) TGCell

    {

    int        GIndex;  

    float    ExLen;

    }    ;           

struct __attribute__ ((packed)) ToolPoint

    {

    float3    pos;  

    float2    cb;      

    }    ;           

struct __attribute__ ((packed)) ToolDef

    {

    int    tooltype;  

    float len;

    float radius;

    }    ;           

#define istrType_extrude2x      0  

#define istrType_extrude5x      1  

#define istrType_revolve2x        2  

#define istrType_revolve2xi        3  

struct __attribute__ ((packed)) TG1_

    {

    int        tool;

    int        istrType;

    float StartTime;

    struct __attribute__ ((packed)) ToolPoint p1;

    float EndTime;

    struct __attribute__ ((packed)) ToolPoint p2;

    float len1;                     

    int        filler;

    }    ;  

  

struct __attribute__ ((packed)) TVoxelData

    {

    float VoxelSize;     

    float VoxelRadius;    

    int NumVoxelX;

    int NumVoxelY;

    int NumVoxelZ;

    };

struct __attribute__ ((packed)) TInputData

    {

    float2    iResolution;

    //uint CurSizeX;

    //uint CurSizeY;

    int4    iMouse;

    int4    iDate;

    float3  iRayOrigin;

    float3    iRayDir;

      

    float3  cu;

    float3  cv;

    float3  cw;

    int Moved;

    int        ShowTool;

    int        CurG;

    int        Flags;  

    float    ifView;

    float ExtSPrec;

  

    float RayCastMaxDist;    

    float RayCastPrec;        

    float3 SoftShadowLightDir;

    float  SoftShadowMint;     

    float  SoftShadowMaxt;   

    float  SoftShadowStep;       

    float  SoftShadowK;       

    int       AONumIterations;      

  

    float2    SubSamplingXY; 

    int        SubSamplingIndex;

    int        SubSamplingSize;   

    float    iGlobalTime;

    };

struct __attribute__ ((packed)) TProgramData

    {

    struct TVoxelData    VoxelData;

    float3  PanelSize;

    int        NumIstructions;

    struct ToolDef    __attribute__ ((packed)) Tools[MAXTOOLS];

    struct TG1_     __attribute__ ((packed)) g1[MAXISTRUCTIONS] ;

    }   ;

vec3 opMove(vec3 p,vec3 m)

{

    return p-m;

}

vec3 opMoveY(vec3 p,float y)

{

    p.y-=y;

    return p;

    //return (vec3)(p.x,p.y-y,p.z);

}

vec3 opRotateY(vec3 p,float a)

{

a=-a;

float3 v3;

float ca=cos(a);

float sa=sin(a);

v3.x=p.x*ca-p.z*sa;

v3.y=p.y;

v3.z=p.z*ca+p.x*sa;

return v3;

//return vec3(p.x*cos(a)-p.z*sin(a),p.y,p.z*cos(a)+p.x*sin(a));

}

vec3 opRotateZ(vec3 p,float a)

{

a=-a;

float3 v3;

float ca=cos(a);

float sa=sin(a);

v3.x=p.x*ca-p.y*sa;

v3.y=p.y*ca+p.x*sa;

v3.z=p.z;

return v3;

//return vec3(p.x*cos(a)-p.y*sin(a),p.y*cos(a)+p.x*sin(a),p.z);

}

float opExt1dir(float p, float x,float e2)

{

    float r;

  

    if (p<x)

        r=p;

    else

    {

    if (p>x+e2)

        r=p-e2;

    else

        r=x;

    }

    return r;  

}

vec3 opExtndir(vec3 p, vec3 x,vec3 e2)

{

    vec3 r;

  

    r.x=opExt1dir(p.x,x.x,e2.x);

    r.y=opExt1dir(p.y,x.y,e2.y);

    r.z=opExt1dir(p.z,x.z,e2.z);

  

    return r;

  

}

float sdSphere( vec3 p, float s )

{

    return length(p)-s;

}

float sdToolballg1_test(vec3 pos,float t,TTG1 *g1,TToolDef *tool)

{

    float t1=1.f-t;

    return sdSphere(

                    opExtndir(

                    opRotateZ(

                        opRotateY(

                            opMoveY(opMove(pos,(pos-0.1f)*t1+(pos+0.1f)*t),0.1f),

                            0.1f*t1+0.1f*t),

                        0.2f*t1+0.2f*t),

                (vec3)(0.0f,0.f,0.f),(vec3)(0.f,0.3f,0.f)),

                0.1f );

}

float sdTooltballe_test(TTPROGRAMDATA *InputData,float sprec,vec3 pos,float t,float s,TTG1 *g1)

{

float d1,d2,dmin;//,sprec;

//int i;

dmin=sdToolballg1_test(pos,t,0,0);

//min=dmin+1.f;

while (s>sprec)

    {

    d2=sdToolballg1_test(pos,t+s,0,0);

    d1=sdToolballg1_test(pos,t-s,0,0);

    if (d1<d2)

        {

        if (d1<dmin)

            {

            dmin=d1;

            t-=s;

            }

        }

    else

        {

        if (d2<dmin)

            {

            dmin=d2;

            t+=s;

            }

        }

    s*=0.5f;         

    }

      

return dmin;

}

float vTooltball_test(TTPROGRAMDATA *ProgramData,float sprec,vec3 pos,TTG1 *g1)

{

float    _fv=1.0f;

#ifdef _DEBUG

if (!g1)

    return 0;

if (g1->tool<0)

    return 0;

#endif

return sdTooltballe_test(ProgramData,sprec,pos,_fv*0.5f,_fv*0.25f,g1);

}

int Test2parseg1(TTPROGRAMDATA *ProgramData,float sprec,vec3 pos,TTG1 *pg1)

{

int i=0,cnt=0;

float v=0.0;

float MaxDistVoxel=ProgramData->VoxelData.VoxelRadius*2.0f;

float vraddia=ProgramData->VoxelData.VoxelRadius;

//using only this code it works

vraddia*=3.0f;

for (i=0;i<ProgramData->NumIstructions;i++)

    {

    v+=vTooltball_test( ProgramData,sprec,pos,&pg1 );

    vraddia+=v;

    if (v>-1000000||vraddia>0)  

        cnt++;

    }

    /*   

---------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------

----WITH THIS CODE THE DRIVER CRASH------------------------------

---------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------

for (i=1;i<ProgramData->NumIstructions;i++)

    {

    v+=vTooltball_test( ProgramData,sprec,pos,&pg1 );

    vraddia+=v;

    if (v>-1000000||vraddia>0)  

        cnt++;

    }

---------------------------------------------------------------------------------------------

*/

return cnt;

}

__kernel void Build_testg2(__constant struct TInputData *InputData,TTPROGRAMDATA *ProgramData,

                                TVOXELSPTR Voxels,TTG1 *pg1)

{

int    iVoxel=get_global_id(0);

#ifdef _DEBUG

if (iVoxel>=MAXNUMVOXELS)

    return ;

#endif

float3    vp=GetVoxelPnt(ProgramData,iVoxel);

Voxels[iVoxel].ListPtr=Test2parseg1(ProgramData,InputData->ExtSPrec,vp,pg1);

}

I hope this can be enough to find a possible cause of the problem.

I am using a 7970 for the test.

The complete cl code is in the order of about 3000 lines of code is it too much for a cl program ?

Thank you,

Denis.

0 Likes

Hi Denis,

Thanks for capturing and posting the problematic code. We'll try to compile your code and let you know our observations. Meanwhile, please let us know your system setup details like CPU, GPU, SDK, Driver, OS (Window/Linux) (32/64) etc.

Regards,

0 Likes

Hi Dipak,

thank you for your support.

I am using an intel  i7 930 processor with 6GB of ram , a Gigabyte HD 7970 GPU Windows 7 64 bit as operative system and beta amd driver 14.4 .

I made also another session of test and when I call from the host the kernel function Build_testg2 2 times the driver crash exactly as with the second crashing loop.

Another interesting test I made is replacing the loop into the routine sdTooltballe_test

float sdTooltballe_test(TTPROGRAMDATA *InputData,float sprec,vec3 pos,float t,float s,TTG1 *g1)

{

float d1,d2,dmin;//,sprec;

//int i;

dmin=sdToolballg1_test(pos,t,0,0);

    

return dmin;

}

using this new routine I made a cycle calling 100 times the kernel function Build_testg2 from the host and it work fine.

So in the next test a change the routine in

float sdTooltballe_test(TTPROGRAMDATA *InputData,float sprec,vec3 pos,float t,float s,TTG1 *g1)

{

float d1,d2,dmin;//,sprec;

int i;

dmin=sdToolballg1_test(pos,t,0,0);

//min=dmin+1.f;

for (i=0;i<3;i++)

    {

    d2=sdToolballg1_test(pos,t+s,0,0);

    d1=sdToolballg1_test(pos,t-s,0,0);

    if (d1<d2)

        {

        if (d1<dmin)

            {

            dmin=d1;

            t-=s;

            }

        }

    else

        {

        if (d2<dmin)

            {

            dmin=d2;

            t+=s;

            }

        }

    s*=0.5f;        

    }

     

return dmin;

}

using this routine calling the kernel the driver crash the I call the third time the function Build_testg2 .

My conclusion on the possible cause :

1) It can not be a cl compiler problem ( compiling Test2parseg1 ) because calling 2 times the kernel function make a crash anyway.

2) It can not be a problem computing the floating points due to the decreasing of "s" into the sdTooltballe_test because also using a cycle of 3 call from the host it crash.

3) It can not be a "fault" state that cause the program in the second call because it can work for 2 times and crash the third and source data is always the same.

I will make more test ...

Now I am using qt 5.2 as graphics interface , I will build a console project using the complete .cl file .

Thank you,

Denis.

0 Likes

Hi Denis,

I really appreciate your enthusiasm for trying out different experiments to narrow down the problem and possibly find the root area.

You've posted different versions of the code you've experimented with and still you're doing.  I'm little bit confused as don't know what to start with. As you've mentioned that you want to do more experiments, I think it will be better for me to wait for some time to find out your observations. When you need our support, just let us know and provide a complete test project (host +GPU) that can be easily compiled and run.

Regards,

Dipak

0 Likes

Hi Dipak,

in the attach of the first post of the thread you can find a complete sample of the problem .

Tonight I made an interesting test replacing into the file toolsim.cl the function

float sdTooltballe_test(TTPROGRAMDATA *InputData,float sprec,vec3 pos,float t,float s,TTG1 *g1)

{

float d1,d2,dmin;

int i;

dmin=sdToolballg1_test(pos,t,0,0);

for (i=0;i<3;i++)

    {

    d2=sdToolballg1_test(pos,t+s,0,0);

    d1=sdToolballg1_test(pos,t-s,0,0);

    if (d1<d2)

        {

        if (d1<dmin)

            {

            dmin=d1;

            t-=s;

            }

        }

    else

        {

        if (d2<dmin)

            {

            dmin=d2;

            t+=s;

            }

        }

    s*=0.5f;          

    }

       

return dmin;

}

with the same exploding the cycle

float sdTooltballe_test(TTPROGRAMDATA *InputData,float sprec,vec3 pos,float t,float s,TTG1 *g1)

{

float d1,d2,dmin;//,sprec;

int i;

dmin=sdToolballg1_test(pos,t,0,0);

    d2=sdToolballg1_test(pos,t+s,0,0);

    d1=sdToolballg1_test(pos,t-s,0,0);

    if (d1<d2)

        {

        if (d1<dmin)

            {

            dmin=d1;

            }

        }

    else

        {

        if (d2<dmin)

            {

            dmin=d2;

            }

        }

    d2=sdToolballg1_test(pos,t+s,0,0);

    d1=sdToolballg1_test(pos,t-s,0,0);

    if (d1<d2)

        {

        if (d1<dmin)

            {

            dmin=d1;

            }

        }

    else

        {

        if (d2<dmin)

            {

            dmin=d2;

            }

        }

    d2=sdToolballg1_test(pos,t+s,0,0);

    d1=sdToolballg1_test(pos,t-s,0,0);

    if (d1<d2)

        {

        if (d1<dmin)

            {

            dmin=d1;

            }

        }

    else

        {

        if (d2<dmin)

            {

            dmin=d2;

            }

        }

       

return dmin;

}

and it work .

Denis.

0 Likes
denis
Adept I

I found the solution ( different from restructuring all to removing one nested cycle level )

0 Likes