CLC compilation times growing (exponentially?)

Discussion created by broxvall on Nov 22, 2010
Latest reply on Mar 19, 2011 by himanshu.gautam
Problem with OpenCL 1.1 compiler requiring too large (> 30 sec) time to compile small test example


I've encountered a strange behaviour of the OpenCL compiler for OpenCL 1.1 (64bit Linux). The initial symptoms was that during the kernel compilation of a program that uses the C++ bindings to OpenCL , my program just hangs during the call to the "build(...)" function - but this is dependent on the actual code that is compiled by CL.

I've stripped down the code to the most extreme case I could find and got the weird situation that depending on if two variables where initialized with a constant or with a trigonometric expression i got compilation times ranging from 3.9 seconds (both are constant), 5.4 seconds (one is constant) to 25 seconds (both are trigonometric).

I've marked the places as A,B respectively C,D to show where the problem seems to occur.

For the larger program, the compilation doesn't termnate within one hour. (I can attach this too if you ask me, but I think this first case already shows the problem better). Also, the problem only occurs when compiling for the GPU (not for the CPU).  I belive that the same jump in compilation time occurs for a few more "tricky" floating point operations that (eg. the ones that require the T-processing element) and that this creates eg. a scheduling problem that grows with exponential time (?)

Can anyone comment on this? Give any hints for work-arounds?

/ Mathias


I've done a simple " while [ 1 ]; do ps aux | grep 'clc' | grep -v grep ; done" which gave me:

mbl      13803  0.0  0.0   4148   576 pts/1    S+   19:03   0:00 sh -c /media/disk/opencl/ati-stream-sdk-v2.2-lnx64/bin/x86_64/clc --emit=llvmbc -D__IMAGE_SUPPORT__=1 -D__Juniper__=1 -D__GPU__=1 -Dcl_khr_global_int32_base_atomics=1 -Dcl_khr_global_int32_extended_atomics=1 -Dcl_khr_local_int32_base_atomics=1 -Dcl_khr_local_int32_extended_atomics=1 -Dcl_khr_3d_image_writes=1 -Dcl_khr_byte_addressable_store=1 -Dcl_khr_gl_sharing=1 -Dcl_amd_device_attribute_query=1 -Dcl_amd_printf=1 -Dcl_amd_media_ops=1 -I./ -o "/tmp/OCLBH0vhZ1.bc" "/tmp/"  2> "/tmp/OCLBH0vhZ.log"







loat4 mycubicSolver(float a,float b,float c,float d) { float4 res; float2 u,v; u = (float2)(0.0); v = (float2)(0.0); float Phi1=atan2(u.s1,u.s0)/3.f; u.s0 = 0.0f; // A //u.s0 = cos(Phi1); // B u.s1 = sin(Phi1); float Phi2=atan2(v.s1,v.s0)/3.f; //v.s0 = 0.0f; // C v.s0 = cos(Phi2); // D v.s1 = sin(Phi2); float2 x1, x2; x1.s0 = u.s0 + v.s0; x1.s1 = 0.f; float2 uv; uv.s0 = 0.f; uv.s1 = u.s1 - v.s1; x2.s0 = -x1.s0/2 - sqrt(3.f)/2.f * uv.s1; x2.s1 = + sqrt(3.f)/2.f * uv.s0; res.s2 = x2.s0; res.s3 = x2.s1; return res; } __kernel void filtering(float4 offset,__global unsigned char *rawIn,__global float *tensorsIn,__global float *filteredOut) { float4 res = mycubicSolver(-1, 1., 1, 1); }