The AMD OpenCL programming guide (page 69) states that work-groups retire in order for the HD 5000 series GPU's. Is this behavior defined in the specification or does it only apply to the 5000 HD series?
One or more work-groups execute on each compute unit. On the ATI RadeonTM
HD 5000-series GPUs, work-groups are dispatched in a linear order, with x
changing most rapidly. For a single dimension, this is:
DispatchOrder = get_group_id(0)
For two dimensions, this is:
DispatchOrder = get_group_id(0) + get_group_id(1) * get_num_groups(0)
This is row-major-ordering of the blocks in the index space. Once all compute
units are in use, additional work-groups are assigned to compute units as
needed. Work-groups retire in order, so active work-groups are contiguous.
I have a kernel that takes two arrays of strings. The kernel takes one string in array A and searches for a match in array B. The end result is a boolean global array of ints (not ptimal, but easy and my gfx card is only OpenCL 1.0 compliant ). From the boolean array a floating point score is calculated, this part of the code is highly sequential as it involves running sums, but I would like to calculate the score in the kernel. What I am thinking is that if the work-groups retire in order I could use a local barrier and then have the last work-item perform the sequential score calculation.