5 Replies Latest reply on Dec 8, 2011 8:09 AM by antzrhere

    finding median in array

      Finding median in an array so which sorting algorithm is suitable

      Given an Unsorted Array , I want to find out median of array without sorting an array or partially sorting an array  with minimum possible complexity using Opencl .Should I use Parallel bubble sort and partially sort the array to get median or any other method.Plz suggest me as early as possible.:):):)

        • finding median in array

          array is of around 500000 elements....

            • finding median in array

              Intuitively it would surprise me if there is a way to do this without extracting all the information you would require from a sort. You need the data conceptually ordered and the only way to get that order may be to sort. Rather than implement your own algorithm I would think that the best thing would be to use a fast sort and then extract from that data set. 

            • finding median in array

              You are looking for an algorithm called "median of medians", there is an article in Wikipedia. I'm not aware of any GPU-accelerated implementations, unfortunately. If you can afford to copy the array to the host, C++ STL algorithm "nth_element" will locate the median in O(N) time.

                • finding median in array

                  To find a *good approximation* of a median you could do it this way:


                  run through all values once (0-500000) and get the minimum and maximum value (range if your data).

                  Now use a 1D array (size 'n'  - I'll explain size later) where the first element represents the min value and the last element represents the max value. zero all elements.

                  Run through all the values again (0-500000) and increment the array element that the value falls within.

                  i.e. Array[ ((value-minvalue) / (minvalue-maxvalue+epsilon)) * arraysize]++;

                  (This can be easily optimised to get rid of division and other ALU).

                  Now read through Array[] (0-'n'), adding up the values as you go along UNTIL you reach 500000/2. This is the region of space where your median is located. To improve things a little bit you could interpolinate between adjacent array values  when you reach the median.


                  Concerning what array size to use, the larger the array size, the better likelyhood that the calculated median is closer to the actual median (improved resolution). Essentially your just defining a range where the median is located somewhere within. To improve the accuracy (make the range smaller) increase the array size. Of course if your data sample has a very large range, but most values are clustured very close together then you will need a large array size to getter approximation of the median.


                  ..you can also use this to estimate lower and upper quartile by this method.