diedalusus

Why does the Compiler behave this way?

Discussion created by diedalusus on Jan 25, 2011
Latest reply on Feb 1, 2011 by diedalusus
gpu compilation fails

Hi,

i am almost new to OpenCL and have following problems:

  1.  I wrote a long kernel (mostly containing 32 while-loops doing binary search over a table which calculates than interpolations over the found values). The problem is following: I seem to reach a limit of the gpu-compiler, where it trys to compile the code(allocating more and more memory, than seems to fail(memory freed), and than trys it again and again) until it finishes with following outputs:

             clBuildProgram fails with CL_BUILD_PROGRAM_FAILURE

             and the Build Log says: Error: Creating kernel get_Values failed!

          The code compiles fine on CPU. I can't shorten my code because of the limitations of RAM and non-scalar vectors. So it would be nice to get a hint which problem let the Compiler behave this way or to get more data maybe a verbose-mode of the compiler.

      2. Is there any coincidence between the bug where the clock frequency of the HD5870 isn't read out correctly (0Mhz) and the not-working function barrier on the gpu?(I assume that because CLK_LOCAL_MEM_FENCE where CLK stands for clock?, but maybe iam wrong.). Iam asking this because, the barrier-function is completly ignored on gpu(works fine on cpu).

Thanks for the answers.

P.S.: my system: ubuntu x64 10.04LTS + HD5870

        Unfortunatly I can not attach the kernel, because the code is under copyright but if no one knows an answer i try to reproduce the problem with another kernel

So I figured out when I change this code:

less_even.s0 = fabs(table[start_ind_even.s0*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.s0*NMB_COLUMN]-param);
  less_even.s1 = fabs(table[start_ind_even.s1*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.s1*NMB_COLUMN]-param);
  less_even.s2 = fabs(table[start_ind_even.s2*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.s2*NMB_COLUMN]-param);
  less_even.s3 = fabs(table[start_ind_even.s3*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.s3*NMB_COLUMN]-param);
  less_even.s4 = fabs(table[start_ind_even.s4*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.s4*NMB_COLUMN]-param);
  less_even.s5 = fabs(table[start_ind_even.s5*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.s5*NMB_COLUMN]-param);
  less_even.s6 = fabs(table[start_ind_even.s6*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.s6*NMB_COLUMN]-param);
  less_even.s7 = fabs(table[start_ind_even.s7*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.s7*NMB_COLUMN]-param);
  less_even.s8 = fabs(table[start_ind_even.s8*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.s8*NMB_COLUMN]-param);
  less_even.s9 = fabs(table[start_ind_even.s9*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.s9*NMB_COLUMN]-param);
  less_even.sA = fabs(table[start_ind_even.sA*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.sA*NMB_COLUMN]-param);
  less_even.sB = fabs(table[start_ind_even.sB*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.sB*NMB_COLUMN]-param);
  less_even.sC = fabs(table[start_ind_even.sC*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.sC*NMB_COLUMN]-param);
  less_even.sD = fabs(table[start_ind_even.sD*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.sD*NMB_COLUMN]-param);
  less_even.sE = fabs(table[start_ind_even.sE*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.sE*NMB_COLUMN]-param);
  less_even.sF = fabs(table[start_ind_even.sF*NMB_COLUMN]-param) <= fabs(table[stop_ind_even.sF*NMB_COLUMN]-param);

to this one:

start_ind_even *= NMB_COLUMN;

less_even.s0 = fabs(table[start_ind_even.s0]-param) <= fabs(table[stop_ind_even.s0*NMB_COLUMN]-param);
  less_even.s1 = fabs(table[start_ind_even.s1]-param) <= fabs(table[stop_ind_even.s1*NMB_COLUMN]-param);
  less_even.s2 = fabs(table[start_ind_even.s2]-param) <= fabs(table[stop_ind_even.s2*NMB_COLUMN]-param);
  less_even.s3 = fabs(table[start_ind_even.s3]-param) <= fabs(table[stop_ind_even.s3*NMB_COLUMN]-param);
  less_even.s4 = fabs(table[start_ind_even.s4]-param) <= fabs(table[stop_ind_even.s4*NMB_COLUMN]-param);
  less_even.s5 = fabs(table[start_ind_even.s5]-param) <= fabs(table[stop_ind_even.s5*NMB_COLUMN]-param);
  less_even.s6 = fabs(table[start_ind_even.s6]-param) <= fabs(table[stop_ind_even.s6*NMB_COLUMN]-param);
  less_even.s7 = fabs(table[start_ind_even.s7]-param) <= fabs(table[stop_ind_even.s7*NMB_COLUMN]-param);
  less_even.s8 = fabs(table[start_ind_even.s8]-param) <= fabs(table[stop_ind_even.s8*NMB_COLUMN]-param);
  less_even.s9 = fabs(table[start_ind_even.s9]-param) <= fabs(table[stop_ind_even.s9*NMB_COLUMN]-param);
  less_even.sA = fabs(table[start_ind_even.sA]-param) <= fabs(table[stop_ind_even.sA*NMB_COLUMN]-param);
  less_even.sB = fabs(table[start_ind_even.sB]-param) <= fabs(table[stop_ind_even.sB*NMB_COLUMN]-param);
  less_even.sC = fabs(table[start_ind_even.sC]-param) <= fabs(table[stop_ind_even.sC*NMB_COLUMN]-param);
  less_even.sD = fabs(table[start_ind_even.sD]-param) <= fabs(table[stop_ind_even.sD*NMB_COLUMN]-param);
  less_even.sE = fabs(table[start_ind_even.sE]-param) <= fabs(table[stop_ind_even.sE*NMB_COLUMN]-param);
  less_even.sF = fabs(table[start_ind_even.sF]-param) <= fabs(table[stop_ind_even.sF*NMB_COLUMN]-param);

where table is __global float* buffer, NMB_COLUMN a constant given over at compile time and param a float,

than the compiler show this behaviour(it is not the complete code as you can imagine).

UPDATE:  Compilation on CPU runs fine but Execution stops with a SIGSEGV acessing adress 0x0 in __OpenCL_get_Values_stub(). What is this function for?

backtrace:

#0  0x00007fffeb750bc0 in __OpenCL_get_values_stub () from /tmp/OCLbMaXbq.so
#1  0x00007fffe3828c2d in ?? () from /usr/local/ati-stream-sdk-v2.3-lnx64/lib/x86_64/libatiocl64.so
#2  0x00007fffe3829791 in ?? () from /usr/local/ati-stream-sdk-v2.3-lnx64/lib/x86_64/libatiocl64.so
#3  0x00007fffe3878c2c in ?? () from /usr/local/ati-stream-sdk-v2.3-lnx64/lib/x86_64/libatiocl64.so
#4  0x00007fffe3876edd in ?? () from /usr/local/ati-stream-sdk-v2.3-lnx64/lib/x86_64/libatiocl64.so
#5  0x00007ffff6a849ca in start_thread (arg=) at pthread_create.c:300
#6  0x00007ffff718d70d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Outcomes