Technical Articles => Algorithm =>  Algorithm

Find the kth smallest number in an array

Source : sonic0002    Date : 2013-01-09 06:20:54  

This is an classical question, the general solution to this question is first using sorting algorithm to sort the array, then get the kth number in the array, the complexity is O(nlogn), we can also use selective sort or head sort to, the complexity of selective sort is O(kn) and heap sort is O(nlogk). The better solution is to use quick sort to find the kth smallest number, the complexity is O(n), the worst case cost is O(n^2). But today we introduce one more solution which has the worst case cost O(n).

First let's check the quick sort solution, quick sort is to find one number, then put all numbers in the array which is less than the number chosen on one side and then put the other numbers on the other side. Then recursively  sort them again. Then to find the kth smallest number in the array, we only need to find it in one side of the array.

  1. import random
  2.  
  3. def partition(arr, left, right, pivot):
  4.     v = arr[pivot]
  5.     arr[pivot], arr[right-1] = arr[right-1], arr[pivot]
  6.     index = left
  7.     for i in xrange(left, right):
  8.         if arr[i] <= v:
  9.             arr[i], arr[index] = arr[index], arr[i]
  10.             index += 1
  11.     return index-1
  12.  
  13. def select(arr, left, right, k):
  14.     while right - left > 1:
  15.         index = partition(arr, left, right, random.randint(left, right-1))
  16.         dist = index - left + 1
  17.         if dist == k:
  18.             return arr[index]
  19.         if dist < k:
  20.             k -= dist
  21.             left = index + 1
  22.         else:
  23.             right = index
  24.     return arr[left]

Here arr is the array to search, we can call select to find the kth smallest number, if we cannot find the pivot element correctly, the worst case cost will be O(n^2).

Now we discuss the solution which has worst case cost O(n), we can divide all these numbers into some subarrays which have 5 elements in each subarray, then there will be n/5 subarrays. For each subarray, we can find the middle number very fast, then we can find the middle number of these n/5 middle numbers. This algorithm is called Median of Medians algorithm.

If we use the median of the medians as the pivot, then there will be 3/5*1/2 which is 3/10 numbers which are less or equal to pivot, also there will be 3/10 numbers which are greater than pivot, so the worst case cost is the array will be divided into 30%,70% or 70%,30% two parts.

The cost is :

T(n)<=T(n/5)+T(7/10*n)+O(n)<=c*n*(1+9/10+(9/10)^2....)
So T(n)=O(n)

The worst case cost is O(n).

  1. import heapq
  2.  
  3. def partition(arr, left, right, pivot):
  4.     v = arr[pivot]
  5.     arr[pivot], arr[right-1] = arr[right-1], arr[pivot]
  6.     index = left
  7.     for i in xrange(left, right):
  8.         if arr[i] <= v:
  9.             arr[i], arr[index] = arr[index], arr[i]
  10.             index += 1
  11.     return index-1
  12.  
  13. def select_heap(arr, left, right, k):
  14.     tmp = arr[left:right]
  15.     heapq.heapify(tmp)
  16.     [heapq.heappop(tmp) for i in xrange(k-1)]
  17.     return heapq.heappop(tmp)
  18.  
  19. def median(arr, left, right):
  20.     num = (right - left - 1) / 5
  21.     for i in xrange(num+1):
  22.         sub_left = left + i*5
  23.         sub_right = sub_left + 5
  24.         if sub_right > right:
  25.             sub_right = right
  26.         m_index = select_heap(arr, sub_left, sub_right, (sub_right-sub_left)/2)
  27.         arr[left+i], arr[m_index] = arr[m_index], arr[left+i]
  28.     return select(arr, left, left+num+1, (num+1)/2)
  29.  
  30. def select(arr, left, right, k):
  31.     while right - left > 1:
  32.         pivot = median(arr, left, right)
  33.         index = partition(arr, left, right, pivot)
  34.         dist = index - left + 1
  35.         if dist == k:
  36.             return arr[index]
  37.         if dist < k:
  38.             k -= dist
  39.             left = index + 1
  40.         else:
  41.             right = index
  42.     return arr[left]

Source : http://www.isnowfy.com/top-k-number/

Save as PDF Mark as read Mark as important
By clicking the "Mark as read" button, this article will be marked as read. It will be removed from the homepage's latest news and the article list on the "Technical article" page in following visits and it will be put to your read list which you can find in "Amin->Article read list". There you can unmark the read articles.
By clicking the "Mark as important" button, this article will be put to your important article list which you can find in "Amin->Article important list". Later when you want reread this article, it's easier for you to find it by checking the "Article important list".

Tags:Sort, Quick sort, Search,Smallest   Read(1507) Comment(0)

Share on Facebook  Share on Twitter  Share on Google+  Share on Weibo  Share on Digg  Share on Tumblr    Delicious 

 Previous : Output a file with HTTP range header in PHP
 Next : Big file transfer in Linux

  ::Related Articles

  ::Comment Zone

No comment for this article.

  ::Comment

Nickname  
Email 
Comment

:: Other versions

No other versions available yet.

:: Recent articles

:: Most read

:: Most commented

:: Contribute

Want to share with the world your understanding about technology? Want to record the process you solve a technical problem? Want to make the world benefit from your understanding and solution? Write them down. You make the world better, the world makes us better.

 Write article

:: Find us

Back to top