Given an Unsorted Array , I want to find out median of array without sorting an array or partially sorting an array with minimum possible complexity using Opencl .Should I use Parallel bubble sort and partially sort the array to get median or any other method.Plz suggest me as early as possible.:):):)
array is of around 500000 elements....
Intuitively it would surprise me if there is a way to do this without extracting all the information you would require from a sort. You need the data conceptually ordered and the only way to get that order may be to sort. Rather than implement your own algorithm I would think that the best thing would be to use a fast sort and then extract from that data set.
Sir,
I only need the median value with minimum complexity thats it.I dont need to sort whole array for that (I guess).So i think no need to sort whole array n then calculate median instead we can partially sort array.
You are looking for an algorithm called "median of medians", there is an article in Wikipedia. I'm not aware of any GPU-accelerated implementations, unfortunately. If you can afford to copy the array to the host, C++ STL algorithm "nth_element" will locate the median in O(N) time.
To find a *good approximation* of a median you could do it this way:
run through all values once (0-500000) and get the minimum and maximum value (range if your data).
Now use a 1D array (size 'n' - I'll explain size later) where the first element represents the min value and the last element represents the max value. zero all elements.
Run through all the values again (0-500000) and increment the array element that the value falls within.
i.e. Array[ ((value-minvalue) / (minvalue-maxvalue+epsilon)) * arraysize]++;
(This can be easily optimised to get rid of division and other ALU).
Now read through Array[] (0-'n'), adding up the values as you go along UNTIL you reach 500000/2. This is the region of space where your median is located. To improve things a little bit you could interpolinate between adjacent array values when you reach the median.
Concerning what array size to use, the larger the array size, the better likelyhood that the calculated median is closer to the actual median (improved resolution). Essentially your just defining a range where the median is located somewhere within. To improve the accuracy (make the range smaller) increase the array size. Of course if your data sample has a very large range, but most values are clustured very close together then you will need a large array size to getter approximation of the median.
..you can also use this to estimate lower and upper quartile by this method.