cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

lupescu_grigore
Journeyman III

Wrong CPU output

Simple erosion/dilation alg gives wrong output

I've written a simple program that pipelines 3 kernels ( bitsplicing -> dilation -> erossion)The tested image was Lena (not that it would matter but anyway). I use a threshold of 160 setting the bits to 0 or 1 before dilation/erosion. 

In both dilation and erosion algorithms i get full black/white image on CPU if i set (up,down,left,right) pixels. If i set only 3 of them (eg up,left,right) i get a semi correct response. It is somewhat clear that the algorithm somehow overwrites values as if it would if it were in a for loop working on the same image, but i have 2 images (input,output).

Is this a compiler error or am i missing something ?

LE : 

GPU worked without a problem (HD5470). CPU is Core i3. OpenCL SDK 2.2. Linux Ubuntu 10.10 64 bit

 

I've added the kernel code just to be clear how i do dilation or erosion ( & insted of | ) though it should make no difference since GPU output is ok whilst CPU is not.

 

__kernel void dilate(__global int* input,__global int* output,int width,int height)

{

 

     uint x = get_global_id(0);

     uint y = get_global_id(1);

// frame width

int frameW=LEVEL;

// frame array

int frameA=LEVEL*LEVEL;

 

int frameLeft=x;

int frameBottom=y;

 

int frameTop=((x+frameW) < (width)) ? (x+frameW): (width);

int frameRight=((y+frameW) < (height)) ? (y+frameW): (height);

 

// get center value (1/0)

int center=input[(frameLeft+1)+width*(frameBottom+1)]/255;

// dilate

output[(frameLeft+1)+width*(frameBottom+1)]=center*255; // center (1,1)

 

output[(frameLeft+0)+width*(frameBottom+1)]=((input[(frameLeft+0)+width*(frameBottom+1)] /255) | center )*255; // up (0,1)

output[(frameLeft+1)+width*(frameBottom+2)]=((input[(frameLeft+1)+width*(frameBottom+2)] /255) | center )*255; // down (2,1)

output[(frameLeft+1)+width*(frameBottom+0)]=((input[(frameLeft+1)+width*(frameBottom+0)] /255) | center )*255; // left (1,0)

output[(frameLeft+2)+width*(frameBottom+1)]=((input[(frameLeft+2)+width*(frameBottom+1)] /255) | center )*255; // right (1,2)

}

 

 

0 Likes
4 Replies
lupescu_grigore
Journeyman III

I managed to find out the problem. I pipelined by passing in the same memory address (output) -big oops-to the kernel and whilst  the CPU failed by reading/writing from its memory the GPU executed correctly. Whats more interesting is that the GPU managed to read correctly the output memory which was of type WRITE_ONLY. 

Hence, my bad, but interesting enough the GPU managed to correctly output the intended result.

0 Likes

lupescu_grigore,

Thanks for sharing this issue.

A buffer created using CL_MEM_WRITE_ONLY flag should raise an error. The issue has been reported to Developers.

0 Likes

If i use only CL_MEM_READ_WRITE instead of READ_ONLY+WRITE_ONLY how does that affect the app ? Is there a performance difference ?

0 Likes

I checked out by using a clEnqueueWriteBuffer command to write a 4kb data and did 25 iterations:

The mean time was as found:

0.114ms for CL_MEM_READ_WRITE.

0.122ms CL_MEM_WRITE_ONLY

So it seems flags doesn't affect performance much.

0 Likes