AnsweredAssumed Answered

Unexpected results on CPU with optimizations enabled

Question asked by ikalou on Aug 21, 2014
Latest reply on Aug 25, 2014 by dipak

I want to remove an unnecessary write to global memory (threshold_ptr_plus_flags) at the beginning of my kernel by using a local variable instead (plus_flag). However, while my modified code appears to run properly on both the CPU and the GPU with "-cl-opt-disable", it breaks on the CPU when optimizations are enabled.

 

This is my kernel :

 

__kernel void setup_plus_minus_A(
   const int index, 
   __local char* const threshold_ptr_plus_tmp_flags, 
   __global char* const threshold_ptr_plus_flags,
  __constant const float* const candidates_value
)
{
    char plus_flag;
    const int id = get_global_id(0);
    const int local_id = get_local_id(0);

    if (id > 0) {
      if (id<index && candidates_value[id - 1]==candidates_value[id]){
             threshold_ptr_plus_flags[id]  = 0;
             plus_flag = 0;
      } else {
             threshold_ptr_plus_flags[id]  = 1;
             plus_flag = 1;
      }
    } else {
      threshold_ptr_plus_flags[0]  = 0;
      plus_flag = 0;
    }

    barrier(CLK_LOCAL_MEM_FENCE|CLK_GLOBAL_MEM_FENCE);

    if (threshold_ptr_plus_flags[id] != plus_flag) {
        printf((__constant char *)"AP Id: %d diff: %d vs %d\n", id, (int)threshold_ptr_plus_flags[id], (int)plus_flag);
    }

    threshold_ptr_plus_tmp_flags[local_id] = plus_flag; //threshold_ptr_plus_flags[id]; <- Here I would like to use plus_flag instead of threshold_ptr...
    /* ...not shown here: prefix sum on threshold_ptr_plus_tmp_flags... */
    
    /* Write results back to global memory */
    threshold_ptr_plus_flags[id] = threshold_ptr_plus_tmp_flags[...];
}





 

With optimizations enabled on the CPU, the above kernel gives me incorrect results for some threads and prints the following:

Id: 738 diff: 1 vs 0

...

Id: 768 diff: 0 vs 1

...

 

Do you have any idea what I'm doing wrong for this to happen?

 

Thank you for your help,

Outcomes