Remedial Queuing Theory

© Copyright 2011 Herbert J. Bernstein
Portions derived from
G22.3023 Special Topics of Industrial Interest
Network Design and Implementation
Lecture Notes Spring 84
© Copyright 1984, 1991
Herbert J. Bernstein & Max Goldstein


One of the essential tools in network design is queuing theory, which may be viewed as a subfield of the study of "stochastic processes" (see, for example, S. Karlin, "A First Course in Stochastic Processes", Academic Press, New York, 1969, 502 pp.). Stochastic processes are those governed by the behavior of some family of random variables, R(t), where t is an index variable, often representing time. For a particular value, t0, of t, we do not necessarily have a unique value R(t0). Rather, we know the probabilities that R(t0) will assume particular values. By probability, we mean the limiting value over infinitely many trials of the ratio of the number of times R(t0) assumes the desired values to the total number of trials.

Customers and Servers

The model we use in dealing with queues considers customers and servers. We have a supply of customers requiring service. We have servers to provide that service. We do not know when any particular customer may arrive, nor do we know how long a server will take to service a particular customer. We only know the general pattern of arrival of customers and provision of service. For example, we might know that customers arrive at an overall average rate, rc, with no dependence between the arrival of any two customers, and that servers serve at an overall average rate rs, with no dependence between the service time for any two customers.

Such customer and server random variables are said to obey Poisson distributions, with a mean interarrival time of customers of 1/rc, and a mean service time of 1/rs. If there are m servers, such a model is called an M/M/m system, where M stands for Markov, since Markov chains are an effective tool to study such systems.

Markov Chains

A Markov chain is a special case of a Markov process. A Markov process is a stochastic process in which future behavior is completely determined by the behavior at the present moment rather than by memory of past behavior. A Markov chain is a Markov process in which time is considered only in discrete steps and in which there are only finitely many or at worst countably many possible states of the system. The system can then be analysed by considering the matrix of transition probabilities from state to state. We will return to Markov chains in more detail latter.

In general, if the customer random variables obey a rule X, and the server random variables obey a rule Y, and there are m servers, then the queue model is called an X/Y/m system.

For most networking problems, an M/M/1 system model is adequate, and we will consider only those.

Poisson Distribution

Let us derive the behavior of a Poisson distribution. We assume that events arrive in any time interval at the same rate that they arrive in any other time interval, and we assume that by taking a sufficiently small time interval we can limit our attention to the arrival of no events or one event. It would be reasonable to say that for a small time interval, dt, r*dt events are likely to arrive, where r is the rate of event arrivals. Equivalently, we could say that the probability, P1(dt), of an event arriving in an interval of small length dt is approximately r*dt, and that the probability, P0(dt), of no events arriving is approximately 1-r*dt.

For larger intervals it is reasonable to consider the probabilities of more than one event arriving. Let us generalize our notation to Pk(t) = the probability of exactly k events arriving in an interval of length t and derive Pk(t) by examining its derivative:

     d Pk(t)        Pk(t+dt)-Pk(t)
     ------- = lim  --------------
       dt      dt->0     dt

             = lim  -----------------------------------
                    Pk(t)*(1-r*dt-1) + P[k-1](t)*r*dt
             = lim  ---------------------------------
            = r*(P[k-1](t) - Pk(t))

In particular, we have dP0(t)/dt = -r*P0(t), so that P0(t) must be exp(-rt+const).

Since the probability of no events arriving in an interval of length 0 is one, the "const" must be absent, so

     P0(t) = exp(-r*t),


     d(P1(t))/dt = r*(exp(-r*t) - P1(t))


     d(P1(t)*exp(r*t))/dt = r - r*P1(t)*exp(r*t) + r*P1(t)*exp(r*t)

          = r

and thus

     P1(t)*exp(r*t) = r*t + const.

Since P1(0) = 0, it is clear that "const" = 0, and

     P1(t) = r*t*exp(-r*t)

Similarly, we see

     d(Pk(t)*exp(r*t))/dt = r*P[k-1](t)*exp(r*t)


                  k  -r*t
     Pk(t) = (r*t) *e    /k!

by induction.

Expected Number of Events per Unit Time

Let us use this result to compute the expected number of events per unit time:

     E(t) =  /   k*P (t)
            /____   k
            k = 0

             \          k  -r*t
          =  /   k*(r*t) *e    /k!
            k = 1
                 -r*t  \        k-1
          = r*t*e    * /   (r*t)   /(k-1)! 
                      k = 1

                 -r*t  r*t
          = r*t*e    *e

Thus r, the instantaneous rate, is also the long term rate of event arrivals.

M/M/1 Queue

Now let us consider an M/M/1 queue. Customers arrive according to a Poisson distribution with a rate rc and are served according to a Poisson distribution with rate rs. Let Q(k,n,t) be the probability that a queue of length n becomes a queue of length k in a time interval of length t. Again we compute the derivatives of the probabilities.

First we need the value of Q at a slightly perturbed time:

     Q(k,n,t+dt) = Q(k,n,t)*Q(k,k,dt) + Q(k+1,n,t)*Q(k,k+1,dt)
                   + Q(k-1,n,t)*Q(k,k-1,dt),

since, under our assumptions for Poisson distributions, we need only consider changes due to at most one arrival and/or service in a small time interval. Now, if we consider only k > 0, we have

     Q(k,k,dt) = P(no customers \& no sevices)
                 + P(1 customer \& 1 service)
               = ~ (1-rc*dt)*(1-rs*dt) + rc*dt*rs*dt
               = ~ 1 - (rc+rs)*dt,


     Q(k,k+1,dt) = P(no customers & 1 service)
               = ~ (1-rc*dt)*rs*dt = ~ rs*dt,    


     Q(k,k-1,dt) = P(1 customer & no service)
               = ~ rc*dt*(1-rs*dt) = ~ rc*dt,

so that

     Q(k,n,t+dt) = ~ Q(k,n,t)
                   + dt*(-(rc+rs)*Q(k,n,t) + rs*Q(k+1,n,t)
                                   + rc*Q(k-1,n,t)),


     d Q(k,n,t)
     ---------- = -(rc+rs)*Q(k,n,t) + rs*Q(k+1,n,t) + rc*Q(k-1,n,t).

For the case k=0, the probability of no service = 1 in a short time interval, so that Q(0,0,dt) = 1 - rc*dt. Further, since queues cannot go negative, Q(0,-1,dt)=0.


     Q(0,n,t+dt) = ~ Q(0,n,t) + dt*(-rc*Q(0,n,t) + rs*Q(1,n,t)),


     d Q(0,n,t)
     ---------- = -rc*Q(0,n,t) + rs*Q(1,n,t).

If we note that Q(k,n,0) is zero for k != n, and one for k = n, we can solve this system of differential difference equations for small time. However, we are usually interested in the behavior over large time.

If the queuing system achieves "statistical equilibrium", the initial queue length will no longer matter and the probability of arriving at any particular queue length will become independent of time. Write Q(k) for this long term probability, and we see that

     (rc+rs)*Q(k) = rs*Q(k+1) + rc*Q(k-1), for k > 0,


     rc*Q(0) = rs*Q(1).    

To understand these equations, consider a queue of length k, k > 0, as a state. We can enter that state from above by a service, and from below by the arrival of a customer. We can leave that state for the next state above by the arrival of a customer, or leave for the next state below by a service. In order to preserve the relative populations of the possible states, the flows into and out of each must balance:

                 ______             ______             ______
      rc*Q(k-2) |      | rc*Q(k-1) |      |  rc*Q(k)  |      |
     ----->-----|      |----->-----|      |----->-----|      |--
                | k - 1|           |  k   |           | k + 1|
     -----<-----|      |-----<-----|      |-----<-----|      |--
      rs*Q(k-1) |______|  rs*Q(k)  |______| rs*Q(k+1) |______|

which are precisely the equations we obtained above for k > 0. For k = 0, there are no flows to or from lower states, whence rc*Q(0) = rs*Q(1). A system of states with this property of transitions only to and from the states immediately above and below is called a "birth-death" system.

We can solve for Q(k) by induction, starting with

     rc*Q(0) = rs*Q(1),

to remove Q(1) from the next equation:

     (rc+rs)*Q(1) = rs*Q(2) + rc*Q(0)


             rc             rc
     Q(2) = (-- + 1)*Q(1) - --*Q(0)
             rs             rs
             rc      rc        rc
          = (-- + 1)*--*Q(0) - --*Q(0)  = (rc/rs)**2 *Q(0),
             rs      rs        rs

and, in the general case, by induction,


     Q(k) = (rc/rs)**k *Q(0).    


         _____       _____
         \           \
          \           \   / rc \ n
     1 =  /   Q(n) =  /  |  --  | * Q(0)
         /____       /____\ rs /
     = ----------- * Q(0)
       1 - (rc/rs)

which implies

     Q(0) = 1- (rc/rs),


     Q(k) = (1 - (rc/rs)) * (rc/rs)**k.

The expected queue length is then

         _____                _____
         \                    \
          \               rc   \    rc n
     L =  / n*Q(n) = (1 - --)* / n*(--)
         /____            rs  /____ rs
       = (1 - (rc/rs)) * (rc/rs) / (1 - (rc/rs))**2
       = (rc/rs) / (1 - rc/rs).

From this result, we see that, for an M/M/1 queue, the service rate must be significantly faster than the customer arrival rate, if the queue length is to be kept small. We can see this again by computing the expected waiting time.

Expected Waiting Time

If a customer arrives and finds the queue at length n, he can expect to wait the service time of those already waiting plus his own service time until he is done. Thus the waiting time, W, is

         n + 1
     W = -----  units of time

on the average. It follows that the expected waiting time is

     1    /  rc/rs       \
   ---- *| ---------- + 1 |
    rs    \1 - rc/rs     /

         1    /    1     \
     = ---- *| ---------- |
        rs    \ 1 - rc/rs/

     = 1/(rs - rc).

From this we see that equal service rates and customer arrival rates in an M/M/1 queue produce infinite delays, and the queue cannot be stable if rs ≤ rc. If we wish a queue of expected length 1 and an expected delay time equal to the expected time between customers, we must make the service rate twice the customer arrival rate.

The probability that the long term queue will be of a length greater than, say, m, is

        /  Q(k)
       k = m+1
        /      \  \
       |     rc |  \        k
     = | 1 - -- |  / (rc/rs) 
       |     rs | /____
        \      /  k = m+1

     = (1 - rc/rs) * (rc/rs)**(m+1) / (1 - rc/rs)
     = (rc/rs)**(m+1)
For example, if rc/rs = .5, the probability of having to cope with a queue length of at least 10 is less than .001, while, if rc/rs = .99, the probability of having to cope with a queue of at least 10 is greater than 9/10. This is consistent with the expected queue lengths of 1 and 99 respectively.

Little's Formula

While the assumption of M/M/1 queues works well in practice for networks, it would be reassuring not to require Poisson distributions. D.C. Little ("A proof of the queuing formula L=lambda*W", Operations Research 9, 1961, 383-387) proved the intuitively reasonable result that the expected length of a queue equals the rate of arrival of customers multiplied by the expected delay time for a customer, i.e. during the delay time, W, with r customers per unit time arriving, r*W customers would arrive, while by definition, a customer at the end of the queue of length, L, would have gotten to the front and been served. If the expected queue length, L, is stable, then clearly we must have r*W = L. Equivalently, we can express the expected waiting time as

     W = L/r,

which can be used to confirm the M/M/1 delay time estimate. Inasmuch as we already have this result, let us make better use of Little's formula.

Consider any set of queues, of expected lengths, Li, arrival rates, ri, and expected delay times, Wi. By Little's formula,

     Wi = Li/ri.

Now define a new queue as follows. The customers for this queue consist of the combined set of all customers for the individual queues. A customer will be considered to have been served by this combined queue when he has been served by his appropriate individual queue. The expected size of this total queue is just the sum of the individual Li, and the average rate of arrival of customers is just the sum of the individual ri.

Thus the expected delay for a customer of the overall queue is

                                          /__ ri*Wi
       W = summation(Li)/summation(ri)  = ---------
                                          /__ ri

This is intuitively reasonable, since we are simply saying that the expected delay per queue is the average of the individual queue delays weighted by the customer traffic for the queues.

We can now apply this to M/M/1 queues. The delay for the ith queue is

     Wi = 1/(ui - ri),

where ui is the rate of queue services and ri is the rate of queue customer arrivals. In this case, the average delay is

          /__ ri/(ui - ri)
     W = -----------------
             /__  ri   

Multiple Servers

In the design of networks, the problem of combining multiple servers arises. If we assume one supply of customers at rate rc, and a set of m servers each serving at rate rs, then, if the long term probability of achieving a total queue length k is called Q(k), we can relate these probabilities by:


     Q(0)*rc = Q(1)*rs
     Q(1)*rc + Q(1)*rs = Q(0)*rc + Q(2)*2*rs
     Q(2)*rc + Q(2)*2*rs = Q(1)*rc + Q(3)*3*rs
     Q(k)*rc + Q(k)*k*rs = Q(k-1)*rc + Q(k+1)*(k+1)*rs
     Q(m)*rc + Q(m)*m*rs = Q(m-1)*rc + Q(m+1)*m*rs
     Q(n)*rc + Q(n)*m*rs = Q(n-1)*rc + Q(n+1)*m*rs

by matching "flows" among the states, and noting that we cannot have a service in an unoccupied queue.

               ______          ______          ______
              |      |   rc   |      |   rc   |      |
       --->---|Q(k-1)|--->----| Q(k) |--->----|Q(k+1)|--->---
              |      |        |      |        |      |
       ---<---|      |---<----|      |---<----|      |---<---
              |______| k*rs   |______|(k+1)*rs|______|

It follows that

     Q(1) = (rc/rs)*Q(0)
     Q(2) = (rc/rs)*Q(1)/2
     Q(3) = (rc/rs)*Q(2)/3
     Q(k) = (rc/rs)*Q(k-1)/k
     Q(m) = (rc/rs)*Q(m-1)/m
     Q(n) = (rc/rs)*Q(n-1)/m    

This implies that

     Q(k) = (rc/rs)**k*Q(0)/k!, k = 1,..m,
     Q(n) = (rc/(m*rs))**n * Q(0) * m**m /m!

This result is very similar to the one that we obtained for an M/M/1 queue for the longer queue lengths, if we consider an m*rs rate server, but is quite different for the shorter queue lengths.

We can obtain the actual probabilities by summing Q(0) + Q(1) + ... to 1, but in order compute expected queue lengths, it is sufficient to leave the probabilites in this form to compute

     E = 0*Q(0) + 1*Q(1) + 2*Q(2) + .. + k*Q(k) + ...
       = Q(0) * ( 1*Q(1)/Q(0) + 2*Q(2)/Q(0) + .. k*Q(k)/Q(0) + ... )

         _____                      _____
         \              /  \ k      \           /     \ k
          \       1    | rc |     m  \    k    |  rc   |
          /     ----- *| -- |  + m * /    -- * | ----- |
         /____  (k-1)! | rs |       /____ m!   | m * rs|
         k = 1          \  /        k = m+1     \     /
     = ------------------------------------------------------
         _____                      _____
         \          /  \ k          \            /     \ k
          \     1  | rc |         m  \    1     |  rc   |
          /    -- *| -- |      + m * /    -- *  | ----- |
         /____ k!  | rs |           /____ m!    | m * rs|
         k = 0      \  /             k = m       \     /

which may be evaluated in closed form by noting that the infinite sum in the denominator is just a geometric series, and that the infinite sum in the numerator is the derivative of a geometric series.

However, in many cases, it is a sufficiently accurate approximation to consider the case where m goes to infinity. This leaves us with a numerator which is (rc/rs) times an exponential and a denominator in which the same exponential appears. Thus, in the limit, the expected queue length for such a queuing system is just the ratio of the rate of arrival of customers to the rate of services by a single server.

To see how realistic such an approximation can be, consider the case where rc = 10 and rs = 1. While the queue length is infinite for 10 servers, it stabilizes to a queue length of 10.000000000075 with 11 servers, and is effectively an exact 10 with 13 servers, giving an average delay per customer of one service time. One should not think this a full substitute for a single, faster server, since use of a single server with service rate rs = 12, would give an expected queue length of only 5, i.e. half the service delay of the multiple server queue. On the other hand, when reliability considerations lead one to use of, say, two half speed servers, rather than one full speed server, the delay penalty can be rather small. Say rc = rs = 1, and m = 2, then E = 4/3; while for rc = 1, rs = 2, m = 1, we have E = 1.

Binomial Distribution Queue

It would seem that we have made a serious mistake in our choice of models thus far. Many, if not most realistic communications problems are based on discrete time, not on the continuous time model we have used. Typically, bits arrive, if at all, at some fixed rate. In a given bit time, at most one bit can arrive, not several as in a Poisson distribution. We will now consider the effect of using discrete time intervals and show that the continuous time models are a good approximation for large time.

Suppose time is counted in integral multiples, t = n*t0, of some minimal time unit, t0, in which a single event may or may not occur. If the short-term rate of arrival of events is r, then the probability, p, of the arrival of an event in the interval t0 is r*t0 = p. The probability, Bk(t), of exactly k events arriving in time t is then governed by the binomial distribution. That is,


                 n!     k      n-k
     Bk(t) = ---------*p *(1-p)

since the binomial coefficient, n!/((n-k)!*k!), is the number of possible patterns of k time slots selected from among the n slots in time t, and p**k*(1-p)**(n-k) is the probability that events will occur in exactly k selected slots. Let us rewrite Pk(t), the probability of exactly k events governed by a Poisson distribution, in the same terms:

                  k  -r*t
     Pk(t) = (r*t) *e    /k!
                     k  -r*n*t0
           = (r*n*t0) *e       /k!
                  k  -p*n
           = (p*n) *e    /k!


     Bk(t)/Pk(t) =
                n!     k      n-k
             --------*p *(1-p)                                 n
             (n-k)!*k!                n*(n-1)*...*(n-k+1)*(1-p)      
           = --------------------   = -------------------------- 
               k  k                          k      k  -p*n
              n *p   -p*n                   n *(1-p) *e
             1*(1-1/n)*(1-2/n)*...*(1-(k-1)/n)) *(1-p)
           = ------------------------------------------
                                k  -p*n
                           (1-p) *e


                   n    -z
     lim  (1 - z/n)  = e

so, if p*n is bounded as n goes to infinity, we can substitute

                 n          n      -pn
     (1 - (pn/n))  = (1 - p)  for e

and the ratio of Bk(t) to Pk(t) consists of terms all going to 1 and itself goes to one. Thus, for small p = r*t0 and large n = t/t0, the binomial distribution is close to the Poisson distribution. In practical terms, this means that we can use the Poisson distribution when r*t0 is small and the times under consideration are long compared to t0.

Consider a queue in which customers arrive according to a binomial distribution at a rate rc and are served according to a binomial distribution at a rate rs. Suppose the minimal time interval, t0, is the same for both distributions. Let C(k,m,t) be the probability that a queue of length m becomes a queue of length k in an interval of length t. As in the analysis of an M/M/1 queue above:

     C(k,m,t+t0) = C(k,m,t)*C(k,k,t0) + C(k+1,m,t)*C(k,k+1,t0)
                          + C(k-1,m,t)*C(k,k-1,t0)

and, for k > 0,

     C(k,k,t0) = P(no customers & no services) + P(1 customer & 1 service)
               = (1 - rc*t0)*(1 - rs*t0) + rc*rs*t0**2
               = 1 - (rc+rs)*t0 + 2*rc*rs*t0**2,
     C(k,k+1,t0) = P(no customers & 1 service)

               = (1 - rc*t0)*rs*t0 = rs*t0 - rc*rs*t0**2
     C(k,k-1,t0) = P(1 customer & no services)
               = rc*t0*(1 - rs*t0) = rc*t0 - rc*rs*t0**2

while, for k = 0, there can be no service in time t0, so that C(0,0,t0) = 1-rc*t0, and, since the queue cannot go negative, C(0,-1,t0) = 0. Thus

     C(0,m,t+t0) = C(0,m,t)*C(0,0,t0) + C(1,m,t)*C(0,1,t0)
          = C(0,m,t)*(1 - rc*t0) + C(1,m,t)*(rs*t0 - rc*rs*t0**2)

We could analyse these difference equations for small time. However, for our applications, the large time case is most interesting, in which the initial queue length has been forgotten. If we let C(k) be the probability of going to a queue length of k over long time, then we obtain the relations:

     rc*C(0) = (rs - rc*rs*t0)*C(1)
             = (rs-rc*rs*t0)*C(k+1) + (rc-rc*rs*t0)*C(k-1)

If we define

     rs' = rs - rc*rs*t0
     rc' = rc - rc*rs*t0


     rc*C(0) = rs'*C(1)
     C(k+1) = (rc'/rs' + 1)*C(k) - (rc'/rs')*C(k-1)

which, as with the M/M/m case above, has a solution similar to that for the M/M/1 queue, but with a slight adjustment in the probabilities of the lower queue lengths.

More on Markov Chains

A detailed study of Markov chains is beyond the scope of these notes. However, some familiarity with the basic terms and concepts is desirable.

The simplest problems in probability theory are those in which the events being studied are totally independent. In a series of papers from 1906 to 1924, A.A. Markov investigated the more difficult problems of dependent events. In our queuing examples, we reached each queue state from ones below or above. In general, one could have a system which might be in any of the states drawn from some set, X, and consider the probabilities of transitions over time from any one of those states, x, to any other one of those states, y. The resulting transition probability, p[x,y], is conditional on starting in state x.

If the set of states is finite, and the transition probabilities do not depend on time, then one can find the probability of going from state x to state y through some unknown intermediate state by computing the sum of all p[x,z]*p[z,y] for all possible intermediate states z. Thus we can form the matrix of all two step transitions by forming the product of the matrix of one step transitions with itself. A Markov chain is a process which we can follow in this manner. The approach can be generalized to continuous time Markov chains, such as the M/M/m queues, above, in which time varies continously, but the states are at worst countable, and further generalized to Markov processes, in which the states may also form a continuum.


In this chapter we have reviewed the basics of queuing theory. We have examined the Poisson distribution and its discrete time analogue, the binomial distribution. We have looked at the behavior of queues formed by customers arriving according to a Poisson distribution and being served by servers providing service according to another Poisson distribution, and we have seen that infinite queues will develop if the customers arrive just at the service rate.


  1. Prove that r*t is the expected number of events in a time interval, t = n*t0, in which event arrival is governed by a binomial distribution with short-term rate of event arrival r over minimal time-step t0.

  2. A. K. Erlang studied the behavior of telephone systems, and developed the basic formulae used for allocation of sufficient trunk capacity in telephone systems. Derive the Erlang formulae giving the probability of finding all lines busy in a telephone exchange having m outgoing lines, a single queue of incoming lines, and Poisson distributed traffic, under the assumption that all traffic for unavailable lines is lost, instead of queued.

  3. What is the expected delay time for a queuing system composed of 100 smaller queuing systems, fifty of which have a customer arrival rate of 10 per second and a expected delay time of 5 seconds each, and 50 of which have a customer arrival rate of 20 per second and expected queue lengths of 1 each?

  4. Suppose an M/M/1 queuing system has a customer arrival rate of 5 per second and an expect queue length of 3. What is the service rate?

Updated 2 Feb 2011