AnsweredAssumed Answered

Strange dead lock

Question asked by d.a.a. on Jul 11, 2012
Latest reply on Jul 12, 2012 by nou

I'm trying to implement the scan primitive, but I came across a weird behavior. Whenever the dimension of the global array 'x' is greater than 64 (the wavefront size), let's say 128, the following kernel hangs up:

 

   kernel void scan_dead_lock( global float * x )
  {
     int id = get_local_id(0);

     for( int s = 1; s < get_local_size(0); s <<= 1 )
     {
        if( id >= s ) {
           float tmp = x[id] + x[id - s];
           barrier( CLK_GLOBAL_MEM_FENCE );
           x[id] = tmp;
        }
        else barrier( CLK_GLOBAL_MEM_FENCE );

        barrier( CLK_GLOBAL_MEM_FENCE );
     }
   } 

But this one, functionally equivalent, does not:

 

  kernel void scan( global float * x )
   {
     float tmp;
     int id = get_local_id(0);

     for( int s = 1; s < get_local_size(0); s <<= 1 )
     {
        if( id >= s ) tmp = x[id] + x[id - s];

        barrier( CLK_GLOBAL_MEM_FENCE );

        if( id >= s ) x[id] = tmp;

        barrier( CLK_GLOBAL_MEM_FENCE );
     }
   } 

I'm using 'global size' = 'local size', so there is only a work-group in flight. Is the OpenCL compiler doing something wrong, maybe removing the "else barrier( CLK_GLOBAL_MEM_FENCE );" part from the first kernel?

 

The dead lock occurs both with Catalyst (fglrx) 12.4 and 12.6 using an AMD HD5750 GPU on Debian GNU/Linux.

 

Thank you.

Outcomes