Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

Good way to calculate min / max values out of 2x2 areas

Hi Guys,

I'm a computer science student pretty new to gpu programming. I have an ATI / AMD 4870 gpu (I think no Image support) and I want to compute a large amount of min max mip maps on it.

Lets say we have  a greyscale image with 4x4 pixels. We take 2x2 areas and save the maximum value of this 2x2 area into one pixel (of an other image). The same thing is done with the min value. So the result is are two 2x2 images which contain max / min data. This is very useful for ray tracing.

Here is a small example (blue pixels are the beginning of a 2x2 area, each square is a pixel):


The images are power of 2 images (256*256, 512*512, 1024*1024 etc.).

Im looking for a good way to implement it. My problem is bounding a kernel to the right areas of memory. And skipp the uneven lines, image this is our image I want only 0, 2, 8 and 10 to be the start points of a kernel call:

Header 1Header 2Header 3Header 4



Any good idea how I can do this? In Cuda if statements are pretty expensive. And the output memory pixel is quite strange to compute since 2x2 pixel become one. My idea:

//dimension is the width (and the height since its the same since its power of 2)

__kernel void kernel (__global uchar* input, uint dimension, __global char* uoutput_min, __global uchar* output_min)


     uint positon_x = whatfunction();

     if ((positon_x % 2) == 0)


          if(input[position_x] > input[position_x+1] && input input[position_x] > input[position_x+dimension+1] &&  input[position_x] > input[position_x+dimension])


               output[position_x/2] = input[position_x];


          //and so on



2 Replies

Use min/max functions defined in OpenCL language. You can also use ternary operators here.

I would suggest to write the 2 results (max and min) in different arrays, to have coalesced writes.


Also you can eliminate the IF at the start, and set the size of kernel domain according to the result data size.

So at every result position, gather four inputs at (x*2,y*2), (x*2+1,y*2), (x*2,y*2+1), (x*2+1,y*2+1) into temp variables, do the min/max-es on them as gautam.himanshu suggested, and write the results.

(on the HD7xxx I've found an instruction that gathers 2x2 pixel channels (one channel at a time) form a texture. It's some sort of reduced bilinear sampler. It's perfect for your task.)