I test a code and can't understand the result.
I am dealing with a uint array x, and have two threads, here is my code:
__kernel void ssum(__global uint *x )
int id= get_global_id(0);
for (int i=0;i<5;i++)
for (int j=0;j<10000*id;j++) x=tan(x);
the result is 0 10 10 10 10 1
but thread 1 must run slower than thread 0, it have 10000 tan() operators, so why every element x[i] always thread 1 operate first then thread 0?
I am surprised is every step in loop is synchronized ? thread 0 will be waited when thread 1 is running tan operators?