1 Reply Latest reply on Aug 25, 2014 4:59 AM by dipak

    Unexpected results on CPU with optimizations enabled

    ikalou

      I want to remove an unnecessary write to global memory (threshold_ptr_plus_flags) at the beginning of my kernel by using a local variable instead (plus_flag). However, while my modified code appears to run properly on both the CPU and the GPU with "-cl-opt-disable", it breaks on the CPU when optimizations are enabled.

       

      This is my kernel :

       

      __kernel void setup_plus_minus_A(
         const int index, 
         __local char* const threshold_ptr_plus_tmp_flags, 
         __global char* const threshold_ptr_plus_flags,
        __constant const float* const candidates_value
      )
      {
          char plus_flag;
          const int id = get_global_id(0);
          const int local_id = get_local_id(0);
      
          if (id > 0) {
            if (id<index && candidates_value[id - 1]==candidates_value[id]){
                   threshold_ptr_plus_flags[id]  = 0;
                   plus_flag = 0;
            } else {
                   threshold_ptr_plus_flags[id]  = 1;
                   plus_flag = 1;
            }
          } else {
            threshold_ptr_plus_flags[0]  = 0;
            plus_flag = 0;
          }
      
          barrier(CLK_LOCAL_MEM_FENCE|CLK_GLOBAL_MEM_FENCE);
      
          if (threshold_ptr_plus_flags[id] != plus_flag) {
              printf((__constant char *)"AP Id: %d diff: %d vs %d\n", id, (int)threshold_ptr_plus_flags[id], (int)plus_flag);
          }
      
          threshold_ptr_plus_tmp_flags[local_id] = plus_flag; //threshold_ptr_plus_flags[id]; <- Here I would like to use plus_flag instead of threshold_ptr...
          /* ...not shown here: prefix sum on threshold_ptr_plus_tmp_flags... */
          
          /* Write results back to global memory */
          threshold_ptr_plus_flags[id] = threshold_ptr_plus_tmp_flags[...];
      }
      
      
      
      
      
      

       

      With optimizations enabled on the CPU, the above kernel gives me incorrect results for some threads and prints the following:

      Id: 738 diff: 1 vs 0

      ...

      Id: 768 diff: 0 vs 1

      ...

       

      Do you have any idea what I'm doing wrong for this to happen?

       

      Thank you for your help,

        • Re: Unexpected results on CPU with optimizations enabled
          dipak

          Hi,

          it breaks on the CPU when optimizations are enabled

          Its hard to comment anything from what you've posted. I've few queries:

          1. Can I assume, it is working fine on GPU when optimization is enabled?

          2. If only CPU, what's the CPU setup?

          3. What is the version of catalyst driver and APP SDK? Does the prolem exist also with other version of driver/SDK?

           

          It would be great help if you can provide a test code that manifests the same problem.

           

          Regards,