Introduction to Searching and Sorting

With the massive amount of data now available on the Internet, many people spend a great deal of time searching for information and organizing it in useful ways. The general topic of searching through information and organizing it goes back to the earliest days of computing, and understanding the fundamentals of these processes is essential to a full understanding of how best to use computers.

The single best reference on this subject is the book by Donald Knuth, "The Art of Computer Programming, Volume 3, Sorting and Searching", Addison-Wesley, Reading, Mass, 1973. It is a testimony to the importance and durability of this text that a second, 1998, edition (ISBN 0201896850) is available.

• The relationship between sorting and searching
• Unsorted data requires exhaustive searching
• If there are no duplicates we may quit on a hit
• It is easier to search sorted data
• A single sort may serve to support many searches
• Exhaustive searches
• Important for many disciplines
• Even unsorted data may have structure (e.g. bounded variation)
• Exhaustive searches in one dimension are typically O(n), where n is some measure of the extent of the data
• Exhaustive searches in multiple dimensions are typically O(nk) where k is the number of dimensions
• When data has structure times may be improved by
• Divide and conquer
• Random sampling
• Sorting data
• Data is organized by the "natural" ordering of some key (e.g. alphabetically)
• May sort the data in place with minimal extra storage
• Insertion and exchange sorts
• Insertions into existing arrays involve many moves
• Insertion into other data structures may not
• Bubble sort
• Multiple passes, exchanging out of order elements
• O(N2)
• Shell sort
• Due to Donald L. Shell (CACM 2,7 (July 1959), 30-32)
• Diminishing increment sort
• Sort first on widely spaced elements separated by a gap
• Keep reducing the gap (say by half) until we are sorting on adjacent elements
• Can do each sort by simple insertion and shifting up or by a bubble sort
• Can sort all combs of a given gap or just one comb of a given gap
• Time estimation is complex.
• Crudely O(N1.5).
• With proper choice of gaps, O(Nec√(ln N))
• Merge exchange sort
• Parallel version of Shell sort
• Due to K. E. Batcher (proc. AFIPS Spring Joint Computer Conference 32 (1968), 307-314)
• Sort multiple combs of the same gap on separate processors, then merge
• Quick sort (see: http://en.wikipedia.org/wiki/Quicksort).
• Also called partition-exchange sort
• Due to C. A. R. Hoare (Comp. J. 5 (1962), 10-15)
• Recursively subdivide the array
• Pick an element of the array
• Partition the array a into two sets:
• All elements smaller than the chosen element
• All element larger than the chosen element
• Take the first element a[0] as the tentative chosen element
• Starting with the second element run from up from the bottom (index i) and last element run down from top of the array (index j) simultaneously, looking for elements on the wrong side of the chosen element a[0] (a[i] > a[0] and a[j] < a[0]. When pairs are found with i < j, exchange them. If i ≥ j, swap a[0] and a[j].
• Recursively sort each partition (not including the former a[0]), shorter partition first
• Typically O(N ln N), but worst case O(N 2)
• Can be improved to avoid the worst case most of the time
• May provide external data structures
• heap
• also known as priority queue
• a tree; each node has a value and a left and right pointer
• for each tree and subtree, the node at the top is no smaller (or no larger) than any node below it.
• the tree is balanced -- the depth of each terminal leaf node is k or k-1, for some k (approximately log2 N)
• insert into the tree by adding adding a node at the bottom and bubble sorting the value up to the right level (O(ln N)).
• delete by bubbling the hole down (O(ln N)).
• Can be managed as an array or as a tree with pointers
• As an array, a[0] implicitly points to a[1] and a[2]; a[1] points to a[3] and a[4]; a[2] points to a[5] and a[6]; ...; a[k] points to a[2k+1] and a[2k+2]; ...
• heapsort O(N ln N)
• Searching
• Sequential search on an unordered array
• Sequential search on an ordered array
• Binary search on an ordered array
• Bin search
• Hash table
• Searching a binary tree
• Searching a heap

Last Updated on 12 April 2011
By Herbert J. Bernstein
Email: yaya@bernstein-plus-sons.com