IMHO this is optimized and get_global_id is not computed each time.
1 of 1 people found this helpful
They are cheap, and calculated only once at the beginning of a kernel (only if you calling them in your kernel).
For example on the 7970 it costs only a single MAD to acquire a zero based linear thread id. All other ids are calculated based on this linear id using a hidden constant buffer filled with your NDRange's parameters.
If you use only get_global_id(0) then will use about 8 instructions.