AnsweredAssumed Answered

mem_fence() needed for one wavefront?

Question asked by pdsmith on Sep 22, 2013
Latest reply on Sep 25, 2013 by himanshu.gautam

My understanding is that the following reduction does not require memory fences as long as the workgroup size = one wavefront (64 on 7950).

However the following gives incorrect results unless I add a mem_fence(local) after each write. I am using __attribute__((reqd_work_group_size(64,1,1))) in my kernel.

I have successfully used this code before. Either something has changed, or I have found a bug in my own code.

 

    // Reduction min

  dt_min_local[loc_mempos] = fmin( dt_min_local[loc_mempos] , dt_min_local[loc_mempos+32] );

  dt_min_local[loc_mempos] = fmin( dt_min_local[loc_mempos] , dt_min_local[loc_mempos+16] );

  dt_min_local[loc_mempos] = fmin( dt_min_local[loc_mempos] , dt_min_local[loc_mempos+8 ] );

  dt_min_local[loc_mempos] = fmin( dt_min_local[loc_mempos] , dt_min_local[loc_mempos+4 ] );

  dt_min_local[loc_mempos] = fmin( dt_min_local[loc_mempos] , dt_min_local[loc_mempos+2 ] );

 

 

  // Write to global memory

  if (loc_mempos == 0)

    {

      d_min_max_workgroup[get_group_id(0)] = fmin( dt_min_local[loc_mempos] , dt_min_local[loc_mempos+1 ] );

    }

 

Info:

 

Fedora 19. ATI 13.8. HD 7950

Outcomes