# Can anyone give me some advice on the cl code optimize?

Discussion created by zhuzxy on Sep 19, 2011
Latest reply on Sep 25, 2011 by notzed

my code need do some calculations for all the points inside the image, my cl implementation is like the following, each work item deal with 1 point, and the var 'pixel' is a int16 vector, it first be initialized from the global memory ( the image). and compare the vector value with 2 threshold.

// initialize the pixel from the image data.

pixel.s0 = src_image[ x_value1 + (y_value1)* image_width];
pixel.s1 = src_image[ x_value2 + 1 + (y_value2) * image_width];
...

pixel.se = src_image[ x_valuee + (y_valuee)* image_width];
pixel.sf =  src_image[x_valuef + (y_valuef)* image_width];

corner_decision = false;

//decide if there's continuous 9 points larger/smaller than the thresholds.

// cb  , c_b are the thresholds

int8 data=pixel.s01234567;

corner_decision  = (corner_decision  | ( ( ( all(data > cb)  && ( (pixel.s8 > cb.s0) ||(pixel.sf > cb.s0))) ||
( all(data < c_b) && ( (pixel.s8 < c_b.s0)||(pixel.sf <  c_b.s0))) ) ));

data=pixel.s23456789;

...

data=pixel.s89abcdef0;

corner_decision  = (corner_decision  | ( ( ( all(data > cb)  && ( (pixel.s8 > cb.s0) ||(pixel.sf > cb.s0))) ||
( all(data < c_b) && ( (pixel.s8 < c_b.s0)||(pixel.sf <  c_b.s0))) ) ));

...

if (corner_decision != fase)

final_res[pos] = 1;

return;

The problem is when I run it on the GPGPU, the performance is not better than single core CPU on A8-3850 platform. Could anyone give me some advice on optimize directions?