Sorting with Heaps

By Herbert J. Bernstein
© Copyright 2003 Herbert J. Bernstein

Introduction

The Shell sort provides a significant improvement in sorting over simple insertion or a bubble sort. Quick sort usually provides even better performance, but its worst case performance is as bad as that of a bubble sort. For a sort that has O(N log N) performance in all cases, we need to look at trees. Fortunately, when we are done, we will actually be able to do our work using arrays.

A tree is a graph. A graph is a collection of nodes connected to one another by edges. If the edges have a direction, the graph is called a directed graph. If the edges do not have a direction, the graph is called an undirected graph. A path in a graph is a sequence of edges obeying the direction rules of the graph such that the end node of each edge in the path (except for the last edge in the path) is the start node of the next edge in the path. A path is a circuit if the end node of the last edge of the path is the start node of the first edge of the path. A graph is called connected if there is a path from every node to every other node.

One or more values (attribute-values) may be associated with each node of a graph.

One or more values (attribute-values) may be associated with each edge of a graph.


         root node
           /|\
          / | \
         /  |  \
        /   |   \
       a    b    c
      /\    |   /|\
     d  e   f  g h i
    / \  \  \    \  \
In a rooted tree every node except the root is connected to the tree by a single edge from a parent node, and the node itself is called a child of the parent. Each child may be viewed as the root of a sub-tree.

In a binary tree each node as at most two children, and each child has an attribute ("side") with the value "left" or "right".

A heap

These conditions are sufficient to define a heap. Often we impose additional conditions to help to simplify code. One important condition is to gather all the longest paths to one side of the tree. Note that we have not required any particular ordering of the children of a given parent. However, it is also common to put the larger child to the left.

Insertion

We can insert into the heap by adding a value at the bottom of the heap in the next empty leaf node position in the bottom of the heap.



   xx       yy      zz
  /  \     /  \    /  \
 a    b   c    d  e   new

Suppose new is the new value to be inserted. If new is less than zz it is in the right place in the heap. If new is greater than zz it must also be greater than e, so swapping zz and new makes a properly configured subtree. This process continues until new bubbles up to the right level.


   xx       yy      new
  /  \     /  \    /  \
 a    b   c    d  e    zz

Deletion

We can remove any value from the heap, removing it, replacing it with the last node on the bottom level, and then bubbling it up or down to the level at which it is no greater than its parent and no less than its children.

Reading the heap

We read the heap by deleting the top element and the restoring it as a heap so that the second largest element is now at the top. This may seem to be destructive, but we can build a second heap with the items we remove, so nothing need be lost.

While a heap is a tree, because there are only two child node for each parent, it can easily be represented in an array. Let a be the array. Let a[0] be the root node. For any i, let a[2*i+1] and a[2*i+2] be the children of a[i]. This allows us to implement a sort using a heap in an array.

See http://www.inf.fh-flensburg.de/lang/algorithmen/sortieren/heap/heapen.htm