Hi everyone,
I have heard lots and also experienced weird behavior of the program with barrier inside a control flow and since the tech docs does not explain well this so I am seeking for your help. I have several questions on this issue:
1)what the signal that the barrier() statement results to the control unit to commit that it is executed? is that an interrupt signal or program counter...etc?
2)can I you this scenario to avoid the problem:
--the illegal version
if(id=get_local_id(0)%4)
{
loc_data[id]=glob_data[id];
barrier(CLK_LOCAL_MEM_FENCE);
}
do_smt_next++;
-the modified version
if(id=get_local_id(0)%4)
{
loc_data[id]=glob_data[id];
}
barrier(CLK_LOCAL_MEM_FENCE);
do_smt_next++;
---
so if I move the barrier out of the if flow, will the threads that does not enter the if condition wait @ the barrier until the threads that enter the if condition finishing the task inside the if condition?
3) what happen with the barrier in branched condition? like this:
if(id=get_local_id(0)%4)
{
loc_data[id]=glob_data[id];
barrier(CLK_LOCAL_MEM_FENCE);
}
else
{
loc_data[id]=glob_data[id]+4;
barrier(CLK_LOCAL_MEM_FENCE);
}
---
since the condition branches out so not all threads will enter either IF or ELSE condition --> what happen with the barrier there?
Thanks,
Roto
Thanks Micah for very quick response . can you check my updated question again and give me some ideas on those 3 question? Is there a general way to ovoid this?
Thanks,
Roto
Hi Micah,
What I am not very clear from OpenCL spec is the condition for hitting the barrier (i.e. CLK_GLOBAL_MEM_FENCE and CLK_LOCAL_MEM_FENCE). Basically the spec says that these two memory fence barrier insure that all the GLOBAL(local) access is committed before go to next task. Therefore do the work items have to access the GLOBAL (LOCAL) memory to make them hit the barrier (because if you don't do any access to the memory I don't think it will generate any committed signal).
For example: what would happen to this barrier
....
local_v1=0;//write to local
if(id=get_local_id(0)%4)
{
glob_data[id]=0;//write to global
}
barrier(CLK_GLOBAL_MEM_FENCE);
...
in that code, we set a global memory fence barrier but actually only the work items that have get_local_id(0)%4!=0 do a write to global memory. So will the other work items, that does not do a write to global memory, wait at the barrier barrier(CLK_GLOBAL_MEM_FENCE)? Electronically I think there will no global memory committed signal sent to the work items that do not access to the global memory. What should those work items behave in such a situation?
but all work items in work group will wait to other work item to commit memory.
Hi Nou,
But what happen if only some of work items will commit global memory and some will never or very late after. Will all the items be stalled?
Thanks,
Roto
with barrier()? yes.