Searching and Sorting
By Herbert J. Bernstein
© Copyright 1999, 2000, 2003 Herbert J.
Bernstein
Introduction to Searching and Sorting
With the massive amount of data now available on the Internet, many
people spend a great deal of time searching for information and
organizing it in useful ways. The general topic of searching
through information and organizing it goes back to the earliest
days of computing, and understanding the fundamentals of these
processes is essential to a full understanding of how best to
use computers.
The single best reference on this subject is the book by Donald Knuth,
"The Art of Computer Programming, Volume 3, Sorting and Searching",
Addison-Wesley, Reading, Mass, 1973. It is a testimony to the importance
and durability of this text that a second, 1998, edition (ISBN 0201896850)
is available.
- The relationship between sorting and searching
- Unsorted data requires exhaustive searching
- If there are no duplicates we may quit on a hit
- It is easier to search sorted data
- A single sort may serve to support many searches
- Exhaustive searches
- Important for many disciplines
- Even unsorted data may have structure (e.g. bounded variation)
- Exhaustive searches in one dimension are typically O(n), where
n is some measure of the extent of the data
- Exhaustive searches in multiple dimensions are typically
O(nk) where k is the number of dimensions
- When data has structure times may be improved by
- Divide and conquer
- Random sampling
- Sorting data
- Data is organized by the "natural" ordering of some key (e.g.
alphabetically)
- May sort the data in place with minimal extra storage
- Insertion and exchange sorts
- Insertions into existing arrays involve many moves
- Insertion into other data structures may not
- Bubble sort
- Multiple passes, exchanging out of order elements
- O(N2)
- Shell sort
- Due to Donald L. Shell (CACM 2,7 (July 1959), 30-32)
- Diminishing increment sort
- Sort first on widely spaced elements separated by a gap
- Keep reducing the gap (say by half) until we are sorting
on adjacent elements
- Can do each sort by simple insertion and shifting up
or by a bubble sort
- Can sort all combs of a given gap or just one comb of
a given gap
- Time estimation is complex.
- Crudely O(N1.5).
- With proper choice of gaps, O(Nec√(ln N))
- Merge exchange sort
- Parallel version of Shell sort
- Due to K. E. Batcher (proc. AFIPS Spring Joint Computer Conference 32 (1968), 307-314)
- Sort multiple combs of the same gap on separate processors, then merge
- Quick
sort (see:
http://en.wikipedia.org/wiki/Quicksort).
- Also called partition-exchange sort
- Due to C. A. R. Hoare (Comp. J. 5 (1962), 10-15)
- Recursively subdivide the array
- Pick an element of the array
- Partition the array a into two sets:
- All elements smaller than the chosen element
- All element larger than the chosen element
- Take the first element a[0] as the tentative chosen
element
- Starting with the second element
run from up from the bottom (index i) and last element run down from top
of the array (index j) simultaneously, looking for elements on the wrong
side of the chosen element a[0] (a[i] > a[0] and a[j] < a[0]. When
pairs are found with i < j, exchange them. If i ≥ j, swap a[0]
and a[j].
- Recursively sort each partition (not including the former
a[0]), shorter partition first
- Typically O(N ln N), but worst case O(N 2)
- Can be improved to avoid the worst case most of the time
- May provide external data structures
- heap
- also known as priority queue
- a tree; each node has a value and a left and right pointer
- for each tree and subtree, the node at the top is
no smaller (or no larger) than any node below it.
- the tree is balanced -- the depth of each terminal leaf
node is k or k-1, for some k (approximately log2 N)
- insert into the tree by adding adding a node at the bottom
and
bubble sorting the value up to the right level (O(ln N)).
- delete by bubbling the hole down (O(ln N)).
- Can be managed as an array or as a tree with pointers
- As an array, a[0] implicitly points to a[1] and a[2];
a[1] points to a[3] and a[4]; a[2] points to a[5] and a[6]; ...;
a[k] points to a[2k+1] and a[2k+2]; ...
- heapsort O(N ln N)
- Searching
- Sequential search on an unordered array
- Sequential search on an ordered array
- Binary search on an ordered array
- Bin search
- Hash table
- Searching a binary tree
- Searching a heap
Last Updated on 12 April 2011
By Herbert J. Bernstein
Email: yaya@bernstein-plus-sons.com