The kernel below succeeds to compile in some devices, but gives an unhandled exception in others. I've put it through CodeXL in Analyze Mode, and got the error below. Minor changes to the code tend to hide the problem. If this is a compiler issue that will take time to fix, can you please suggest any work-around?
Error message from CodeXL in Analyze Mode:
========== Build started: Building test.cl on 21 devices. ==========
Compiling device: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz... Succeeded!
Compiling device: Barts... Succeeded!
Compiling device: BeaverCreek... Succeeded!
Compiling device: Bonaire... ...Failed!
OpenCL Compile Error: clBuildProgram had an unhanded exception.
--------
Compiling device: Caicos...
Kernel code:
#define BLCUTOFFR 3
#define TILESIZ_X 32
#define TILESIZ_Y 26
#define WINDOWSIZ_Y 32
#define WINDOWSIZ_X 40
inline float myfunct(int i,float pixD, float pixIn) {
float i_f = convert_float_rte(i);
float dR2 = i_f*i_f;
float pixDiff= pixD-pixIn;
float dI2 = pixDiff*pixDiff;
float aux = 2.f*( -dR2*0.01f - dI2*0.05f );
return(aux);
}
__kernel
void mykernel(
__global uchar4* imageOut,
__global uchar4* imageIn
)
{
__local float4 locMem[(WINDOWSIZ_X/4)*WINDOWSIZ_Y];
int wrkItmId = get_local_id(1);
__local float horLoc[TILESIZ_X*WINDOWSIZ_Y];
__local float* horLocPtr;
__local float* locMemFloat;
horLocPtr = horLoc + wrkItmId*TILESIZ_X;
locMemFloat = (__local float*) locMem + wrkItmId*WINDOWSIZ_X + BLCUTOFFR;
for( int k = 0; k < TILESIZ_X; k ++ ){
float pixIn = locMemFloat
; float pixOutTmp = 0.f;
float normalize = 0.f;
for( int i = -BLCUTOFFR; i <= BLCUTOFFR; i ++ ) {
float pixD = locMemFloat[k+i];
float myKerWeight = myfunct(i,pixD,pixIn);
normalize += myKerWeight;
pixOutTmp += pixD * myKerWeight;
}
horLocPtr
= native_divide(pixOutTmp, normalize); }
barrier(CLK_LOCAL_MEM_FENCE);
horLocPtr = horLoc + BLCUTOFFR * TILESIZ_X + wrkItmId;
locMemFloat = (__local float*) locMem + wrkItmId;
for( int k = 0; k < TILESIZ_Y; k ++ ){
float pixIn = horLocPtr[k*TILESIZ_X];
float pixOutTmp = 0.f;
float normalize = 0.f;
for( int i = -BLCUTOFFR; i <= BLCUTOFFR; i ++ ) {
float pixD = horLocPtr[(k+i)*TILESIZ_X];
float myKerWeight = myfunct(i,pixD,pixIn);
normalize += myKerWeight;
pixOutTmp += pixD * myKerWeight;
}
locMemFloat[k*WINDOWSIZ_X] = native_divide(pixOutTmp, normalize);
}
barrier(CLK_LOCAL_MEM_FENCE);
imageOut[0] = (uchar4)(0,1,2,3);
}
Hi,
1. Please can you try to catch details of the compilation error (using clGetProgramBuildInfo() with param CL_PROGRAM_BUILD_LOG)
and share it.
2. If optimization flag was enable (which is default in case of clBuildProgram())
during the compilation error, please can you try to compile the same by disabling the optimization flag and share your observation.
Regards,
Hi Dipak,
1. The call to function clBuildProgram causes a crash (Segmentation fault in Linux, and Unhandled exception in Windows, see full error message below). Therefore, the call to clGetProgramBuildInfo is not reached.
2. Using optimization option -cl-opt-disable does not trigger the exception and the program runs successfully. So unless there are other suggestions I will be using this option as a work-around till the problem is fixed. Any thoughts?
Daniel.
Full error message in Windows:
"Unhandled exception at 0x03B6379B (amdocl.dll) in bilateral_filter_dct.exe: 0xC0000005: Access violation reading location 0x00000004."
Hi Daniel,
Thanks for your information.
We'll try to reproduce the error and keep you updated. Meanwhile, as you've mentioned, building your program with optimization option "-cl-opt-disable" is the best workaround.
Regards,
Hi Daniel,
The kernel code seems working fine with the optimization flag (i.e. without -cl-opt-disable flag) using driver fglrx-14.20 (see setup details below) .
Driver: fglrx-14.20
APP SDK 2.9
Radeon HD 7870 Pitcairn
Ubuntu 14.04 LTS
I also tried to build the kernel using CodeXL [Catalyst 14.20, Radeon HD 8670D, Windows 7 (64)] and it complied successfully for all the devices.
So, please try the same with the latest driver and let us know your observations.
Regards,