cantallo

local subvector write is in fact read-modify-write of the full vector.

Discussion created by cantallo on Jul 11, 2012
Latest reply on Jul 17, 2012 by notzed

I want to share this "Bug" because OpenCL specs are all but clear on that point.

 

I wanted to deinterlace complex-valued signal and put it into a __local array of vectors (type float4)

 

__local float4 re[16],im[16];

__private float4 d;

(...)

d=vload4(get_global_id(0),...);

if ((get_global_id(0)&1)==0)

{

re[get_local_id(0)>>1].lo=d.even;

im[get_local_id(0)>>1].lo=d.odd:

}

else

{

re[get_local_id(0)>>1].hi=d.even;

im[get_local_id(0)>>1].hi=d.odd:

}

(...)

 

Coded like that, all .x .y components of the re & im arrays where uninitialized because assignment to the low (resp. high) half of a vector in local memory is in fact (onV7900) implemented as a full vector read by the thread, modification of the vector half and write of the full vector. I.e. the code above is implemented in a way similar to:

 

__local float4 re[16],im[16];

__private float4 d,tmp;

(...)

d=vload4(get_global_id(0),...);


if ((get_global_id(0)&1)==0)

{

tmp=re[get_local_id(0)>>1];

tmp.lo=d.even;

re[get_local_id(0)>>1]=tmp;

tmp=im[get_local_id(0)>>1];

tmp.lo=d.odd;

im[get_local_id(0)>>1]=tmp:

}

else

{

tmp=re[get_local_id(0)>>1];

tmp.hi=d.even;

re[get_local_id(0)>>1]=tmp;

tmp=im[get_local_id(0)>>1];

tmp.hi=d.odd:

im[get_local_id(0)>>1]=tmp;

}

(...)

 

which fails because any even thread and the next odd thread are executed simultaneously (there are on the same 1/4 wavefront).

 

putting a barrier on local memory between even and odd thread execution solves the problem:

 

__local float4 re[16],im[16];

__private float4 d;

(...)

d=vload4(get_global_id(0),...);

if ((get_global_id(0)&1)==0)

{

re[get_local_id(0)>>1].lo=d.even;

im[get_local_id(0)>>1].lo=d.odd:

}

barrier(CLK_LOCAL_MEM_FENCE);

if ((get_global_id(0)&1)==1)

{

re[get_local_id(0)>>1].hi=d.even;

im[get_local_id(0)>>1].hi=d.odd:

}

(...)

 

I think this point may be usefull to other:

 

if v is e.g. float4, v.hi is a float2 for reading but not as an lvalue

v.hi=something; is simply a short for v=select(v,(float4)(something,something),(unit4)(0,0,-1,-1));

Outcomes