7 Replies Latest reply on Jul 12, 2010 5:17 PM by nou

    Barrier and control flow

    rotor

      Hi everyone,

      I have heard lots and also experienced weird behavior of the program with barrier inside a control flow and since the tech docs does not explain well this so I am seeking for your help. I have several questions on this issue:

      1)what the signal that the barrier() statement results to the control unit to commit that it is executed? is that an interrupt signal or program counter...etc?

      2)can I you this scenario to avoid the problem:

      --the illegal version

      if(id=get_local_id(0)%4)

      {

      loc_data[id]=glob_data[id];

      barrier(CLK_LOCAL_MEM_FENCE);

      }

      do_smt_next++;

      -the modified version

       

      if(id=get_local_id(0)%4)

      {

      loc_data[id]=glob_data[id];

      }

      barrier(CLK_LOCAL_MEM_FENCE);

      do_smt_next++;

       

      ---

      so if I move the barrier out of the if flow, will the threads that does not enter the if condition wait @ the barrier until the threads that enter the if condition finishing the task inside the if condition?

       

      3) what happen with the barrier in branched condition? like this:

       

      if(id=get_local_id(0)%4)

      {

      loc_data[id]=glob_data[id];

      barrier(CLK_LOCAL_MEM_FENCE);

      }

      else

      {

      loc_data[id]=glob_data[id]+4;

      barrier(CLK_LOCAL_MEM_FENCE);

      }



      ---

      since the condition branches out so not all threads will enter either IF or ELSE condition --> what happen with the barrier there?

       

      Thanks,

      Roto

        • Barrier and control flow
          MicahVillmow
          rotor,
          From the OpenCL spec:
          "If barrier is inside a conditional statement, then all
          work-items must enter the conditional if any work-item
          enters the conditional statement and executes the
          barrier.
          If barrer is inside a loop, all work-items must execute
          the barrier for each iteration of the loop before any are
          allowed to continue execution beyond the barrier."

          So the first example is illegal OpenCL code. If you follow the OpenCL spec on how to use the barrier, then there should be no weird behavior. It is only when the spec is violated that undefined behavior occurs.
          • Barrier and control flow
            MicahVillmow
            rotor,
            Every barrier in the source code is unique. If a barrier exists and a work-item hits that barrier, then every work-item must hit that barrier. Barrier maps to the group_barrier ISA instruction on 8XX software and to no-ops on 7XX hardware as we force single wavefront execution. In the second condition and third conditions, if your work-group takes divergent control flow and hits the barrier, either you will get undefined results or your graphics card will lock up.

            So, before you ever put a barrier in control flow, think very hard about whether every work-item will hit the same barrier or not.

              • Barrier and control flow
                rotor

                Hi Micah,

                What I am not very clear from OpenCL spec is the condition for hitting the barrier (i.e. CLK_GLOBAL_MEM_FENCE and CLK_LOCAL_MEM_FENCE). Basically the spec says that these two memory fence barrier insure that all the GLOBAL(local) access is committed before go to next task. Therefore do the work items have to access the GLOBAL (LOCAL) memory to make them hit the barrier (because if you don't do any access to the memory I don't think it will generate any committed signal). 

                For example: what would happen to this barrier

                ....

                local_v1=0;//write to local

                if(id=get_local_id(0)%4)

                {

                glob_data[id]=0;//write to global

                }

                barrier(CLK_GLOBAL_MEM_FENCE);

                ...

                in that code, we set a global memory fence barrier but actually only the work items that have get_local_id(0)%4!=0 do a write to global memory. So will the other work items, that does not do a write to global memory, wait at the barrier barrier(CLK_GLOBAL_MEM_FENCE)? Electronically I think there will no global memory committed signal sent to the work items that do not access to the global memory. What should those work items behave in such a situation?