4 Replies Latest reply on Jul 12, 2012 8:20 AM by nou

    Strange dead lock

    d.a.a.

      I'm trying to implement the scan primitive, but I came across a weird behavior. Whenever the dimension of the global array 'x' is greater than 64 (the wavefront size), let's say 128, the following kernel hangs up:

       

         kernel void scan_dead_lock( global float * x )
        {
           int id = get_local_id(0);
      
           for( int s = 1; s < get_local_size(0); s <<= 1 )
           {
              if( id >= s ) {
                 float tmp = x[id] + x[id - s];
                 barrier( CLK_GLOBAL_MEM_FENCE );
                 x[id] = tmp;
              }
              else barrier( CLK_GLOBAL_MEM_FENCE );
      
              barrier( CLK_GLOBAL_MEM_FENCE );
           }
         } 
      

      But this one, functionally equivalent, does not:

       

        kernel void scan( global float * x )
         {
           float tmp;
           int id = get_local_id(0);
      
           for( int s = 1; s < get_local_size(0); s <<= 1 )
           {
              if( id >= s ) tmp = x[id] + x[id - s];
      
              barrier( CLK_GLOBAL_MEM_FENCE );
      
              if( id >= s ) x[id] = tmp;
      
              barrier( CLK_GLOBAL_MEM_FENCE );
           }
         } 
      

      I'm using 'global size' = 'local size', so there is only a work-group in flight. Is the OpenCL compiler doing something wrong, maybe removing the "else barrier( CLK_GLOBAL_MEM_FENCE );" part from the first kernel?

       

      The dead lock occurs both with Catalyst (fglrx) 12.4 and 12.6 using an AMD HD5750 GPU on Debian GNU/Linux.

       

      Thank you.