Graph Theory for Network Design

© Copyright 2011 Herbert J. Bernstein
Portions derived from
G22.3023 Special Topics of Industrial Interest
Network Design and Implementation
Lecture Notes Spring 84
© Copyright 1984, 1991
Herbert J. Bernstein & Max Goldstein

Introduction

The tools of graph theory find extensive application in network design. For a grounding in the subject, see R.G. Busacker & T.L. Saaty, "Finite Graphs and Networks: An Introduction with Applications", McGraw-Hill, New York, 1965, 294 pp.

When we draw a picture of a communications network, it is usually as a set of nodes connected by communications lines. If we record the physical locations of the nodes as points in 3-space and the paths of the lines as curves in 3-space, the sets V, of points and E, of curves, constitute a "geometric graph". While there are communications design problems which require exact knowledge of node locations and line paths, as in cabling a room full of equipment, much can be learned from the abstract topology of the graph of a communications network.

The essential features we require are distinct labels for the nodes and lines, and a description of which lines connect which nodes. We will wish to supplement that information with line capacities and costs. For the nodes, define a set V of abstract vertices {v[i]}. For the lines, define a set, E, of abstract edges {e[i]}, and a mapping f, from E into VxV, the set of pairs of vertices. If we care about the permitted direction of flow of information, then, if f(e[k]) = (v[i],v[j]), v[i] will be the origin of the flow, and v[j] will be the sink for the flow. If we do not care about the direction, then (v[i],v[j]) will be considered equivalent to (v[j],v[i]).

Abstract Undirected Graph

We define an abstract undirected graph, G = <V,E,f>, as a set of vertices V, a set of edges E, and a mapping f from E to (V x V)/{(v[i],v[j])=(v[j],v[i])}. Usually we simply call G a graph, and consider f implicit in E. For simplicity, we will confine our attention to undirected graphs at present.

Incidence Matrix

If two edges are incident on the same pair of vertices, those edges are said to be parallel. Since we can treat multiple lines between a pair of nodes as one line of composite characteristics, we forbid the case of parallel edges in the abstract graphs under consideration. With that minimal restriction, we can represent a finite graph by its "incidence matrix", where each row and column corresponds to a vertex, and a non-zero entry, say 1, is made for each edge joining a pair of vertices.

For example, consider the following graph:



     v[1]---------------v[2]--------------v[3]
      | \                                / |
      |  \                              /  |
      |   \                            /   |
      |    \                          /    |
      |     \                        /     |
     v[4]---v[5]-------------------v[6]---v[7]
      | \
      |_|

which can be represented by:

  
           v[1]   v[2]   v[3]   v[4]   v[5]   v[6]   v[7]
     v[1]    0      1      0      1      1      0      0
     v[2]    1      0      1      0      0      0      0
     v[3]    0      1      0      0      0      1      1
     v[4]    1      0      0      1      1      0      0
     v[5]    1      0      0      1      0      1      0
     v[6]    0      0      1      0      1      0      1
     v[7]    0      0      1      0      0      1      0

Notice that the only diagonal element which is non-zero is due to the loop from v[4] to itself and that the matrix is symmetric around the main diagonal. (If we had had a directed graph, that symmetry could have been disturbed).

The incidence matrix of a graph can be used to trace paths through sequences of edges. Indeed, the nth power of the incidence matrix contains a non-zero entry for each pair of vertices which can be connected by edge sequences of length n.

Since we are interested in data rates and costs, we may replace the ones in an incidence matrix by some representation of capacity and cost, and use it much the same way. Before doing so, however, we will require some more notation.

Two vertices are called adjacent if some edge joins them. Two edges are called adjacent if some vertex is common to both. A vertex is incident with an edge if it is incident with one of the endpoints of the edge. In that case, we may also say the edge is incident with the vertex.

Degree and Euler's Theorem

The number of edges incident with a vertex, v, counting loops twice, is the degree of v, deg(v). An isolated vertex is a vertex of degree zero. A terminal vertex is a vertex of degree one. A graph is regular of degree d if every vertex in the graph is of degree d.

Euler proved that, for a graph G with vertices V = {v[i], i = 1,..,nv} and edges E = {e[i], i = 1,..,ne}, the summation of deg(v[i]) = 2*ne. (Proof: each edge adds one to the degree of each of two vertices.)

For a finite graph it follows that the number of vertices of odd degree is even.

Proof:



     _____                     _____
     \                         \
      \                         \
      /   deg(v[i])  = 2*ne -   /   deg(v[i])
     /____                     /_____
     odd degrees               even degrees

must be even, since all the terms on the right hand side are even. This implies that there must be an even number of vertices of odd degree, or the left hand side would be odd.

Paths, Circuits, Hamiltonian Circuits

Since we have forbidden parallel edges, there will be no confusion if we write edges, e[k], as the vertex pairs (v[i],v[k]) to which they correspond. A sequence of edges (v[i1],v[j1]), (v[i2],v[j2]), ..., (v[in],v[jn]) is said to form a path of length n if v[jk]=v[i(k+1)], k=1,...,n-1, and v[i1], v[i2], ..., v[in], v[jn] are all distinct. A circuit is similarly defined, except v[jn]=v[i1].

A geodesic is a path of minimal length between two vertices. A graph is connected if there is at least one path connecting every pair of vertices. The diameter of a connected graph is the length of the longest geodesic. A Hamiltonian path is a path which includes every vertex. A Hamiltonian circuit is a circuit which includes every vertex.

A connected graph with no circuits is called a tree. Removal of one edge from a tree must disconnect it, so a tree is, in that sense, a minimal connected graph. Given a finite graph, one can find many trees within it. A tree within a graph which includes all the vertices of the full graph is called a spanning tree. If we assign some cost to each edge, a spanning tree with minimal total cost is called a minimal spanning tree. A spanning tree of a graph need not be unique. A minimal spanning tree of a graph also need not be unique.

Minimal Cost Path

Let us now consider the question of finding a path of minimal total cost between two vertices of a connected graph with costs assigned to its edges. Since loops from vertices to themselves cannot contribute to such a path, we discard them.

First we will consider a rather inefficient algorithm which has the advantage of being obviously correct. Then we will consider an efficient algorithm due to Dijkstra, which avoids unnecessary work, but which is less obvious.

Form the incidence matrix of the graph, with entries of the form C[i,j] or 0, where C[i,j] is the cost of the edge from vertex v[i] to vertex v[j], and an entry of zero means there is no edge. Now use this matrix to compute a new matrix of least costs of paths of length 2, by forming the minima of the sums C[i,k]+C[k,j] for which both terms exist. Repeat the process until we have cost matrices for all possible paths.

Now a path of minimal cost from vertex v[i] to vertex v[j] must be represented by the least (i,j) entry among these matrices, and, retracing the creation of that entry, we are done.

Now that you believe we can find the required path, let us find it more efficiently (E.W. Dijkstra, "A Note on Two Problems in Connexion with Graphs", Numer. Math. 1, pp 269-271, 1959). To understand this algorithm, one should realize that a path of minimal cost from v[i] to v[j] through v[k] must include a path of minimal cost from v[i] to v[k]. If it did not, we could replace that portion of the original path with a path of lower cost and decrease the total cost of the path.



     v[i]-----...---v[k]-----...---v[j]
       \_____________/

Thus the inductive step in finding a path of minimal cost between two vertices is to extend previously found paths of minimal cost by oneedge at a time in order of increasing minimal path length.

Let the starting vertex be v[i1] and the ending vertex be v[in]. Mark all vertices other than v[i1] with a tentative total cost of infinity. Mark v[i1] with a permanent cost of zero, and mark its immediate neighbors with tentative costs given by C[i1,j]. Some of these first neighbors will have a minimal cost. We claim that that cost cannot be lowered for those vertices. The only way it could, would be if there was a multiple edge path from v[i1] to such a vertex with lower total cost, but that would imply that the very first edge in that multiple edge path has strictly lower cost than the supposed minimum of the single edge costs -- a contradiction. Thus we can permanently mark any (or all) of the first neighbors of minimal cost, noting also the previous vertex on the path.


     ______________________________________
    |  v[in]                               |
    |        vertices of unknown           |
    |        minimal path cost             |
    |     ___________________________      |
    |    |                           |     |
    |    |   vertices one edge from  |     |
    |    |   vertices of known       |     |
    |    |   minimal path cost       |     |
    |    |     _________________     |     |
    |    |    |                 |    |     |
    |    |    |   vertices of   |    |     |
    |    |    |   known minimal |    |     |
    |    |    |   path cost     |    |     |
    |    |    | v[i1]           |    |     |
    |    |    |_________________|    |     |
    |    |___________________________|     |
    |______________________________________|

We will now extend the set of vertices with known minimal path cost from v[i1]. At each stage, we will have tentatively marked a finite path cost to all vertices within one edge of permanently marked vertices. The inductive step is as follows. Suppose we have just marked a vertex with the minimal cost of a path to that vertex, and with the prior vertex in that path. Now, to ensure that we have tentatively marked all vertices within one edge of permanently marked vertices, examine the neighbors of this newly marked vertex. If they are not marked with a finite cost, assign one. If they are already marked with a tentative finite cost, see if the newly introduced vertex allows a lower cost path. If so, lower the tentative cost and make the newly permanently marked vertex the predecessor.

Now find one of the tentatively marked vertices of minimal cost. We claim that its cost cannot be further lowered. If it could, there would be a path of lower cost, say v[i1]--v[i2]--...--v[ik], to this vertex v[ik]. If v[i(k-1)] had been marked permanently earlier, then we would have already known about this lower cost path, a contradiction. If v[i(k-1)] was only tentatively marked, then find the first vertex in this lower cost path which is tentatively marked. Since that vertex is within one edge of a permanently marked vertex, it must have a finite tentative marking, but a finite tentative marking which is strictly less than our claimed minimal tentative marking, again a contradiction.

We continue until v[in] is marked permanently.

Directed Graphs

Now let us turn our attention to directed graphs, so that we can handle data flow problems. An abstract directed graph, G = <V,E,f>, consists of a set of vertices V, a set of edges E, and a mapping f from E to V x V. As before, we forbid the case of parallel edges, but, in a directed graph, an edge from v[i] to v[j] is not parallel to an edge from v[j] to v[i], so both may exist. Under this assumption, we may identify an edge by its initial and terminal vertices, and use an incidence matrix to describe the graph.

An edge has positive incidence with its initial vertex and negative incidence with its terminal vertex. The number of edges of positive incidence with a vertex is the positive degree of a vertex. The number of edges of negative incidence with a vertex is its negative degree. Clearly the sum of the positive degrees of the vertices must equal the sum of the negative degrees of the vertices and must also equal the number of edges.

A path in a directed graph is defined as in an undirected graph, but the edge directions must be consistent with one another. A cycle in a directed graph is defined in the same manner as a circuit in an undirected graph, but, again, the edge directions must be consistent.

A directed graph is connected if the equivalent undirected graph is connected. A directed graph is strongly connected if, for every pair of vertices v[i] and v[j], there exists a path from v[i] to v[j] and a path from v[j] to v[i]. A directed graph is strongly k-connected if, for every pair of vertices, v[i] and v[j], there are k distinct paths from v[i] to v[j] which have only v[i] and v[j] in common.

Flows and Networks

In the graph theoretical study of flows, a network is a finite directed graph which is connected and has no loops from vertices to themselves. We will restrict our attention to networks without parallel edges. The major original contributors to the study of flows in networks were L.R. Ford, Jr., D.R. Fulkerson, and G.B. Dantzig.

Consider a network with a limiting flow capacity assigned to each edge. A flow in the graph is an assignment of values to the edges. The value assigned to an edge (v[i],v[j)) indicates a flow from v[i] to v[j] if positive, and a flow from v[j] to v[i] if negative.

A flow is called feasible for the network if it is positive on each edge and no greater than the capacity of the edge.

The total of the flows out from a vertex, v, minus the total of the flows into the vertex is called the net output of v, Q(v). A vertex with positive net flow is called a source. A vertex with negative net flow is called a sink. A vertex with zero net flow, i.e. which conserves flow, is called an intermediate vertex.



     \                     /
      \                   /
       \               3 /
     5  \               /
        _\|           _/
          \           /|
           \         /                 Q(v) = 3+2-5-4 = -4 (a sink)
            \       /
       4     \     /     2
              \   /
     ----->-----v------>---------

If there is only one source and only one sink in the network, then the flow is said to be from the source to the sink. By adding new vertices and edges with properly assigned flows, we can change any flow into a flow with just one source and one sink. To do so, we add a new vertex, which will become the single source, with outgoing flows to each of the original sources equal to their net flows. Then we add a second new vertex, which will become the single sink, with incoming flows from each of the original sinks.

If the original flow was feasible and we assign capacities to the new edges equal to the new flow values, the new flow is feasible, has only outgoing flow values from the single source, only incoming flow values to the single sink, and no net flow at the rest of the vertices. Since the total of all the net flows must be zero (each edge makes a balancing contribution to two vertices), the outgoing flow from the source must equal the magnitude of the incoming flow to the sink. (Notice that some authors call these combined, stronger conditions the definition of a feasible flow).



             ___\____ ...   ... _____\___
            /   /                    /   \
           /                              \
          /                                \
     source_____\____ ...   ... _____\____sink
          \     /                    /     /
           \                              /
            \___\____ ...   ... _____\___/
                /                    /

cuts

Let v[1] and v[2] be vertices. A v[1]-v[2] cut is a set of edges, the removal of which disconnects all paths from v[1] to v[2]. A cut is called minimal if restoring any of the edges in the cut will reconnect the vertices in question. The capacity of a cut is the sum of the capacities of the edges in the cut.

We can use judiciously chosen minimal cuts to analyse feasible flows between vertices. It is important to realize that we are only looking at the flow between pairs of vertices, not at some generalized total flow in the network.

Max-Flow-Min-Cut Theorem

If we have a feasible flow in which v[1] is the only source and v[2] is the only sink, we say the flow is from v[1] to v[2]. The magnitude of that flow is defined as Q(v[1]) = -Q(v[2]). The magnitude of the flow cannot exceed the capacity of a v[1]-v[2] cut (a fortiori of a cut of minimal capacity), so the maximal flow from v[1] to v[2] is bounded above by the capacity of a v[1]-v[2] cut of minimal capacity.

Further, if the maximal flow could not reach the capacity of a cut of minimal capacity, then the limitation on flow would be due to a cut of still smaller capacity, a contradiction. Thus we have the "Max-Flow-Min-Cut" Theorem:

The maximal feasible flow from v[1] to v[2] is equal to the capacity of a v[1]-v[2] cut of minimal capacity.

Finding Maximal Feasible Flow

There are many algorithms to find the maximal feasible flow. An alternative to the Malhotra, Kumar, Maheshwari algorithm (Inf. Proc. Letters, vol 7, Oct 1978, pp 277-278), which relates to minimal cost path finding is given in Busacker & Saaty:

The idea is to start with a reasonable flow and incrementally increase it towards a maximum, with termination guaranteed by restricting flows and capacities to integral values. This restriction is not severe, since we can scale up the capacities to recover any necessary precision.

The tool in these incremental flow improvements is the incremental graph, I(N,F), of a network, N, with feasible flow F. I(N,F) has two directed edges for each directed edge of N, using the same set of vertices as in N. If e is an edge of N, use e for the edge in I(N,F) in the same direction, and e' for the edge in I(N,F) in the opposite direction from e in N. Assign a cost to e of zero, if F(e) is less than the capacity of e. Assign a cost to e' of zero, if F(e) is greater than zero. Otherwise assign a cost to e or e' of one.


 
     N:     *--------->-----------* , edge e in N, flow F(e)
 
 
                ------>---------      edge e in I(N,F), cost 0
               /                \     if F(e) < capacity(e)
              /                  \
     I(N,F): *                    *
              \                  /
               \                /
                ------<---------      edge e' in I(N,F), cost 0
                                      if F(e) > 0

Thus, if we could add to the flow in an edge of N, and have the flow remain feasible, we would have a cost of zero in the edge of I(N,F) in the same direction, while if we could decrease the flow we would have a cost of zero in the edge of I(N,F) in the opposite direction.

Suppose we have an entire cost zero path from source to sink. If we change the flow in N by one in the indicated direction, it is still feasible. Since the path from the source is outward, the effect of this change is to increase the net flow.



     N:     source------->--------*     flow=5, capacity=10
 
                    ----->------  cost=0
                   /            \
                  /              \      can increase flow on edge e
     I(N,F):source                *     and increase net flow
                  \              /
                   \            /
                    -----<------  cost=0

 
     N:     source-------<--------*     flow=5, capacity=10
 
                    -----<------  cost=0
                   /            \
                  /              \
     I(N,F):source                *
                  \              /      can decrease flow on edge e
                   \            /       and increase net flow
                    ----->------  cost=0

By, say, a directed version of Dijkstra's algorithm, we can find the minimal cost path in I(N,F) from v[1] to v[2]. If the cost of this path is zero, we can increase the net flow along the corresponding, path in N, and start again. If all paths have non-zero cost, the flow is already maximal. Since, when we can increase flow, we can always increase by at least one, the algorithm must terminate. In practice, we would increase the flow by the largest value which would keep the flow along the path feasible.

While this algorithm is far from optimal, it is clear and easy to implement.

Finding the Degree of Redundancy of Paths and Vertices

If we redefine the capacities of the edges of a network to all be one, then the maximal flow from v[1] to v[2] gives the number of alternate paths from v[1] to v[2], i.e. the degree of redundancy for traffic from v[1] to v[2]. Note, however, that the alternate paths may share vertices. In order to find the degree of redundancy in vertices, we have to transform the network by converting each vertex into a pair of vertices, connected by a single directed edge. All incoming edges (negatively) incident with the original vertex are made incident with the initial vertex of the new edge, while all outgoing edges (positively) incident with the original vertex are made incident with the terminal vertex of the new edge.


                |
               \|/
                |
     old: --->--*--->---
                |
               /|\
                |
 
 
                |
               \|/
                |
     new: --->--*-->--*--->---
                      |
                     /|\
                      |

By splitting the vertices this way, alternate paths can no longer share vertices of the original network. Thus the degree of redundancy of paths in this new network gives us the degree of redundancy of vertices in the original network.

Summary

In this chapter we have looked at the tools from graph theory which are most useful in network design. The incidence matrix gives us a data structure with which to represent graphs. Dijkstra's algorithm provides an efficient means of finding minimal cost paths within a graph. Using the idea of the incremental graph of a network, we can use minimal cost algorithms to find maximal flows within a network which do not violate the capacity constraints of the network. The same approach can be used to find the degree of redundancy in paths or in vertices of a network.

Exercises

Consider a finite (undirected) graph of 201 vertices, numbered 1 through 201. Suppose each even numbered vertex has an edge to the two odd numbered vertices above and below it in sequence. Suppose each odd numbered vertex in the range 1 through 199 also has an edge to vertex 201. Is the number of edges incident on vertex 201 odd or even? Why?

Take the graph from problem 1 and assign a cost to each edge equal to the square root of the magnitude of the difference in ordinals of the vertices incident on the edge. Find a minimal cost path from vertex 3 to vertex 198, from vertex 8 to vertex 3, from vertex 8 to vertex 198.

Take the graph from problem 1 and make it into a directed graph by directing all edges from lower numbered vertices to higher numbered vertices, except at vertex 201. At vertex 201 direct the edges out if they go to odd numbered vertices, and in from even numbered vertices. Assign a capacity to each edge equal to the magnitude of the difference in ordinals of the vertices incident on the edge. Find the maximal feasible flow from vertex 3 to vertex 198, from vertex 8 to vertex 3, from vertex 8 to vertex 198.

Updated 2 February 2011