
finding median in array
meetajinkya88 Dec 7, 2011 6:33 AM (in response to meetajinkya88)array is of around 500000 elements....

finding median in array
LeeHowes Dec 7, 2011 2:56 PM (in response to meetajinkya88)Intuitively it would surprise me if there is a way to do this without extracting all the information you would require from a sort. You need the data conceptually ordered and the only way to get that order may be to sort. Rather than implement your own algorithm I would think that the best thing would be to use a fast sort and then extract from that data set.

finding median in array
meetajinkya88 Dec 8, 2011 12:16 AM (in response to LeeHowes)Sir,
I only need the median value with minimum complexity thats it.I dont need to sort whole array for that (I guess).So i think no need to sort whole array n then calculate median instead we can partially sort array.



finding median in array
eugenek Dec 8, 2011 7:36 AM (in response to meetajinkya88)You are looking for an algorithm called "median of medians", there is an article in Wikipedia. I'm not aware of any GPUaccelerated implementations, unfortunately. If you can afford to copy the array to the host, C++ STL algorithm "nth_element" will locate the median in O(N) time.

finding median in array
antzrhere Dec 8, 2011 8:09 AM (in response to eugenek)To find a *good approximation* of a median you could do it this way:
run through all values once (0500000) and get the minimum and maximum value (range if your data).
Now use a 1D array (size 'n'  I'll explain size later) where the first element represents the min value and the last element represents the max value. zero all elements.
Run through all the values again (0500000) and increment the array element that the value falls within.
i.e. Array[ ((valueminvalue) / (minvaluemaxvalue+epsilon)) * arraysize]++;
(This can be easily optimised to get rid of division and other ALU).
Now read through Array[] (0'n'), adding up the values as you go along UNTIL you reach 500000/2. This is the region of space where your median is located. To improve things a little bit you could interpolinate between adjacent array values when you reach the median.
Concerning what array size to use, the larger the array size, the better likelyhood that the calculated median is closer to the actual median (improved resolution). Essentially your just defining a range where the median is located somewhere within. To improve the accuracy (make the range smaller) increase the array size. Of course if your data sample has a very large range, but most values are clustured very close together then you will need a large array size to getter approximation of the median.
..you can also use this to estimate lower and upper quartile by this method.
