© Copyright 2003 Herbert J. Bernstein

The Shell sort provides a significant improvement in sorting over simple insertion or a bubble sort. Quick sort usually provides even better performance, but its worst case performance is as bad as that of a bubble sort. For a sort that has O(N log N) performance in all cases, we need to look at trees. Fortunately, when we are done, we will actually be able to do our work using arrays.

A **tree** is a **graph**. A **graph** is a collection of
**nodes**
connected to
one another by **edges**. If the edges have a direction, the graph is
called a **directed graph**. If the edges do not have a direction,
the graph is called an **undirected graph**. A path in a graph
is
a sequence of edges obeying the direction rules of the graph such
that the end node of each edge in the path (except for the last
edge in the path) is the start node of the next edge in the path.
A path is a circuit if the end node of the last edge of the path
is the start node of the first edge of the path. A graph is
called **connected** if there is a path from every node to every other
node.

One or more values (attribute-values) may be associated with each node of a graph.

One or more values (attribute-values) may be associated with each edge of a graph.

- A
**tree**is a directed, connected graph with no cycles - A
**rooted tree**is a tree with one distingushed node (the root) from which there is a path to every other node

root node /|\ / | \ / | \ / | \ a b c /\ | /|\ d e f g h i / \ \ \ \ \

In a **binary tree** each node as at most two children, and each
child has an attribute ("side") with the value "left" or "right".

- A heap is a binary tree with a value associated with each node
- The value of the root of the tree and of each subtree is at least as great as the value of all the nodes below it
- The length of each path to each terminal node of the tree is either the same maximum value or one less than that value

These conditions are sufficient to define a heap. Often we impose
additional conditions to help to simplify code. One important condition
is to gather all the longest paths to one side of the tree. Note that
we have **not** required any particular ordering of the children
of a given parent. However, it is also common to put the larger child
to the left.

We can insert into the heap by adding a value at the bottom of the heap in the next empty leaf node position in the bottom of the heap.

xx yy zz / \ / \ / \ a b c d e new

xx yy new / \ / \ / \ a b c d e zz

We can remove any value from the heap, removing it, replacing it with the last node on the bottom level, and then bubbling it up or down to the level at which it is no greater than its parent and no less than its children.

We read the heap by deleting the top element and the restoring it as a heap so that the second largest element is now at the top. This may seem to be destructive, but we can build a second heap with the items we remove, so nothing need be lost.

While a heap is a tree, because there are only two child node for each parent, it can easily be represented in an array. Let a be the array. Let a[0] be the root node. For any i, let a[2*i+1] and a[2*i+2] be the children of a[i]. This allows us to implement a sort using a heap in an array.

See http://www.inf.fh-flensburg.de/lang/algorithmen/sortieren/heap/heapen.htm