1 Reply Latest reply on Jul 10, 2015 3:50 AM by vladimir_1

    OpenCL 1.2 vs OpenCL 2.0 loops

    vladimir_1

      Hi,

      For R9 390 card with 15.20 kernel (latest one)

       

      Given kernel :

       

      #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable

      #define VALUE_TYPE float

      #define COMPAT_VALUE_TYPE int

      void my_atomic_add(__global VALUE_TYPE * loc, const VALUE_TYPE f)

      {  

        VALUE_TYPE old = *loc;  

        VALUE_TYPE sum = old + f;  

        while(atomic_cmpxchg(        (__global COMPAT_VALUE_TYPE*)loc, *((COMPAT_VALUE_TYPE*)&old), *((COMPAT_VALUE_TYPE*)&sum)        ) != *((COMPAT_VALUE_TYPE*)&old)) 

        {     old = *loc;     sum = old + f;   }

      }

      __kernel void test_atomic_lock(__global float* output )

      {

        int idx = get_global_id(0);

        my_atomic_add(output, 1);

      }

       

      l

      launched with local/global size 16.

      When compiled in 1.2 compability mode it finishes and produces correct result (16).

      When compiled in 2.0 compatibility mode (build options -cl-std=CL2.0 -D CL_VERSION_2_0) it hangs the card.

       

      built in R7 (spectre) correctly executed in both mode using 14.12 driver and showed same behaviour (1.2 succeeds and 2.0 fails).

       

      It seems that 15.xx driver version introduced some fatal upgrade to the conditional loop handling in 2.0 compatibility mode.

        • Re: OpenCL 1.2 vs OpenCL 2.0 loops
          vladimir_1

          Sorry for double posting, indeed 2.0 mode brings in an  optimizer "enhancement"

          Following kernel works:

           

           

          void my_atomic_add(__global VALUE_TYPE * loc, const VALUE_TYPE f)

           

          {  

           

               VALUE_TYPE old = *loc;  

           

               VALUE_TYPE sum = old + f;  

           

               volatile bool test =true;

           

               while(test) 

           

               {

           

                    test = atomic_cmpxchg(        (__global COMPAT_VALUE_TYPE*)loc, *((COMPAT_VALUE_TYPE*)&old), *     ((COMPAT_VALUE_TYPE*)&sum)        ) != *((COMPAT_VALUE_TYPE*)&old);

           

                    if (test)

           

                    {

           

                         old = *loc;     sum = old + f;

           

                    }

           

               }

           

          }

           

          Notice the volatile before bool =(

           

          Conclusion: optimizer should re-evaluate atomic operations in the while loop conditions.