4 Replies Latest reply on Mar 23, 2013 8:38 AM by himanshu.gautam

    help to the synchronized problem in loop

    via0517

      I test a code and can't understand the result.

       

      I am dealing with a uint array x[], and have two threads, here is my code:

       

      __kernel void ssum(__global uint *x )

      {

          int id= get_global_id(0);

          for (int i=0;i<5;i++)

          {

              x[i+id]=x[i+id]*10+id;

         

              float x=123;

              for (int j=0;j<10000*id;j++) x=tan(x);

          }   

      }

       

      the result is 0 10 10 10 10 1

      but thread 1 must run slower than thread 0, it have 10000 tan() operators, so why every element x[i] always thread 1 operate first then thread 0?

      I am surprised is every step in loop is synchronized ?  thread 0 will be waited when thread 1 is running tan operators?

        • Re: help to the synchronized problem in loop
          realhet

          float x=123;

          for (int j=0;j<10000*id;j++) x=tan(x);

          This is eliminated completely by the compiler as this code does nothing. It only alters a temp variable which isn't used later.

           

          Also that 2 thread will run in paralell when thread 0 calculates x[i+0] thread 1 will calculate x[i+1] and so on.

           

          "I am surprised is every step in loop is synchronized"

          It is not 'multitasking', it's Single Instruction Multiple Data. ALU Operations are done simultaneously, and memory IO is somewhat serialized (that can generate stalls and pause the ALU instruction processing).

          1 of 1 people found this helpful
          • Re: help to the synchronized problem in loop
            LeeHowes

            You don't have two threads. You have two work-items. A work-item is not a thread, it is mapped by the compiler and runtime to some underlying thread in an implementation-defined manner. The reality is that we map 64 work-items to a single thread in single instruction multiple data fashion - that means that all 64 work-items execute a single instruction at the same time (on different data, as we see in your example).

             

            When people use the term "thread" for a work-item that is a matter of convenience for the programming model, not a description of how it maps to in the hardware when you consider that (as you imply given your surprise in the post) a thread is generally considered to be an independent entity with its own program counter. A single work-item does not have its own PC during execution.