# optimization of a background extraction algorithm

Discussion created by mux85 on Nov 2, 2010
Latest reply on Nov 8, 2010 by mux85

hi, i'm working on a simple background extraction algorithm. the input is formed by a series of frames (images composed by pixels, each pixel is in rgb format) and is passed to the kernel as a simple array. the bg extraction is done by computing the median of the pixels in corrisponding position in all the frames (eg. the pixel in position (0,0) of the bg is computed as median of all the pixels at position (0,0) of all the input frames). the result is also downscaled in a very simple way (dimension halved in both directions), every pixel of the result is computed as the average of 4 pixels (2x2). everything works fine and now i want to do some optimization to get better perfomance and maybe to eliminate the for loop which make the kernel only work on CPU (as discussed in this thread http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=133698&enterthread=y).

`#define MAX_FRAMES 32 void swap(uchar3 * a, uchar3 * b) { uchar3 t=*a; *a=*b; *b=t; } float rgbToLum(uchar3 pix) { return 0.3f*pix.x+0.59f*pix.y+0.11f*pix.z; } void sort(uchar3 * v, uint n) { bool swapped=true; while(swapped) { swapped=false; for(uint i=0; i<n-1; i++) { if(rgbToLum(v[i])>rgbToLum(v[i+1])) { swap(&v[i],&v[i+1]); swapped=true; } } n=n-1; } } uchar3 median(uchar3 * v, uint n) { sort(v, n); return v[n/2]; } uchar3 average(uchar3 * m) { return m[0]/4+m[1]/4+m[2]/4+m[3]/4; } kernel void BackgroundKernel( global read_only uchar * buf_in, global write_only uchar * buf_bg, read_only uint n) { ushort i = get_global_id(0); ushort j = get_global_id(1); ushort h_bg = get_global_size(0); ushort w_bg = get_global_size(1); ushort h_in = h_bg*2; ushort w_in = w_bg*2; uint frame_size = w_in*h_in; uint pos_in = w_in*i*2+j*2; uint pos_bg = w_bg*i+j; //down-scaling of the frames uchar3 matr[4]; uchar3 temp[MAX_FRAMES]; for(uint k=0; k<n; k++) { matr[0] = vload3(pos_in, buf_in); matr[1] = vload3(pos_in+1, buf_in); matr[2] = vload3(pos_in+w_in, buf_in); matr[3] = vload3(pos_in+w_in+1, buf_in); temp[k] = average(matr); pos_in += frame_size; } //median of the frames used as bg vstore3(median(temp, n), pos_bg, buf_bg); }`