Archives Discussions

danval · ‎06-12-2014

The kernel below succeeds to compile in some devices, but gives an unhandled exception in others. I've put it through CodeXL in Analyze Mode, and got the error below. Minor changes to the code tend to hide the problem. If this is a compiler issue that will take time to fix, can you please suggest any work-around?

Error message from CodeXL in Analyze Mode:

========== Build started: Building test.cl on 21 devices. ==========

Compiling device: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz... Succeeded!

Compiling device: Barts... Succeeded!

Compiling device: BeaverCreek... Succeeded!

Compiling device: Bonaire... ...Failed!

OpenCL Compile Error: clBuildProgram had an unhanded exception.

--------

Compiling device: Caicos...

Kernel code:


#define  BLCUTOFFR                              3

#define  TILESIZ_X                             32

#define  TILESIZ_Y                             26

#define  WINDOWSIZ_Y                           32

#define  WINDOWSIZ_X                           40





inline float myfunct(int i,float pixD, float pixIn) {

  float i_f = convert_float_rte(i);

  float dR2  = i_f*i_f;             

  float pixDiff= pixD-pixIn;        

  float dI2  = pixDiff*pixDiff;     

  float   aux    = 2.f*(  -dR2*0.01f  -  dI2*0.05f  );

  return(aux);

}





__kernel  

void mykernel(

            __global uchar4*  imageOut,

   __global uchar4*  imageIn

  )

{

  __local float4  locMem[(WINDOWSIZ_X/4)*WINDOWSIZ_Y];

  int wrkItmId   = get_local_id(1);



  __local float horLoc[TILESIZ_X*WINDOWSIZ_Y];

  __local float* horLocPtr;

  __local float* locMemFloat;



  horLocPtr = horLoc + wrkItmId*TILESIZ_X;

  locMemFloat = (__local float*) locMem + wrkItmId*WINDOWSIZ_X + BLCUTOFFR;



  for( int k = 0; k < TILESIZ_X; k ++ ){

  float pixIn   = locMemFloat;

  float pixOutTmp = 0.f;

  float normalize = 0.f;

  for( int i = -BLCUTOFFR; i <= BLCUTOFFR; i ++ ) {

  float pixD = locMemFloat[k+i];

  float myKerWeight = myfunct(i,pixD,pixIn);

  normalize += myKerWeight;

  pixOutTmp += pixD * myKerWeight;

  }

  horLocPtr = native_divide(pixOutTmp, normalize);

  }



  barrier(CLK_LOCAL_MEM_FENCE);


  horLocPtr = horLoc + BLCUTOFFR * TILESIZ_X + wrkItmId;

  locMemFloat = (__local float*) locMem + wrkItmId;

  for( int k = 0; k < TILESIZ_Y; k ++ ){

  float pixIn   = horLocPtr[k*TILESIZ_X];

  float pixOutTmp = 0.f;

  float normalize = 0.f;

  for( int i = -BLCUTOFFR; i <= BLCUTOFFR; i ++ ) {

  float pixD = horLocPtr[(k+i)*TILESIZ_X];

  float myKerWeight = myfunct(i,pixD,pixIn);

  normalize += myKerWeight;

  pixOutTmp += pixD * myKerWeight;

  }

  locMemFloat[k*WINDOWSIZ_X] = native_divide(pixOutTmp, normalize);

  }


  barrier(CLK_LOCAL_MEM_FENCE);

  imageOut[0] = (uchar4)(0,1,2,3);

}

dipak · ‎06-13-2014

Hi,

1. Please can you try to catch details of the compilation error (using clGetProgramBuildInfo() with param CL_PROGRAM_BUILD_LOG)and share it.

2. If optimization flag was enable (which is default in case of clBuildProgram())during the compilation error, please can you try to compile the same by disabling the optimization flag and share your observation.

Regards,

danval · ‎06-13-2014

Hi Dipak,

1. The call to function clBuildProgram causes a crash (Segmentation fault in Linux, and Unhandled exception in Windows, see full error message below). Therefore, the call to clGetProgramBuildInfo is not reached.

2. Using optimization option -cl-opt-disable does not trigger the exception and the program runs successfully. So unless there are other suggestions I will be using this option as a work-around till the problem is fixed. Any thoughts?

Daniel.

Full error message in Windows:

"Unhandled exception at 0x03B6379B (amdocl.dll) in bilateral_filter_dct.exe: 0xC0000005: Access violation reading location 0x00000004."

dipak · ‎06-16-2014

Hi Daniel,

Thanks for your information.

We'll try to reproduce the error and keep you updated. Meanwhile, as you've mentioned, building your program with optimization option "-cl-opt-disable" is the best workaround.

Regards,

dipak · ‎08-26-2014

Hi Daniel,

The kernel code seems working fine with the optimization flag (i.e. without -cl-opt-disable flag) using driver fglrx-14.20 (see setup details below) .

Driver: fglrx-14.20

APP SDK 2.9

Radeon HD 7870 Pitcairn

Ubuntu 14.04 LTS

I also tried to build the kernel using CodeXL [Catalyst 14.20, Radeon HD 8670D, Windows 7 (64)] and it complied successfully for all the devices.

So, please try the same with the latest driver and let us know your observations.

Regards,

Archives Discussions

clBuildProgram had an unhandled exception