4 Replies Latest reply on Jan 12, 2011 8:42 PM by dominik_g

    global_work_offset in OpenCL 1.1


      With OpenCL 1.1 the global_work_offset parameter for clEnqueueNDRange can be used to specify an offset for calculating the global ID of a work-item.

      In my understanding this should mean that both get_global_id() and get_group_id() should be affected by the offset. However, it seems like the AMD implementation only considers the offset for get_global_id() but not for get_group_id().

      Unfortunately I don't have access to NVIDIA's implementation of OpenCL 1.1 to test what they do...

        • global_work_offset in OpenCL 1.1
          From the OpenCL spec:
          "global_work_offset can be used to specify an array of work_dim unsigned values that describe
          the offset used to calculate the global ID of a work-item."

          The formula for calculating global id is:
          (gx , gy) = (wx * Sx + sx + Fx, wy * Sy + sy + Fy)

          w{x,y} = work-group id
          S{x,y} = work-group size
          s{x,y} = local id
          F{x,y} = global ID offset

          All implementations should behave the exact same way since global ID calculations is explicit in the spec.
            • global_work_offset in OpenCL 1.1

              Thanks for your reply, Micah.

              You're right. It's specified like this in the spec, but I don't think it makes much sense... I thought the offsets can be used to easily split a task between multiple devices, for example. Say you want to execute half of the task on GPU1 and the rest on GPU2. Then you could simply specify the offset for GPU2 as half of the overall problem size. But if the kernel uses get_group_id() this wouldn't work...

              I always assumed that get_global_id() / get_local_size = get_group_id(). But apparently that's not the case anymore...

              Is there any specific reason for this? What else could the offset be used for other than splitting tasks?

            • global_work_offset in OpenCL 1.1
              get_group_id() returns the ID of the group in your ND Range. How you offset your NDRange domain doesn't change the number of groups that are executed in a launch. Also, the way the calculation is done is because that is how the hardware works. Div/Mod is expensive, so the hardware builds the global ID from the local/group, not the other way around.