Probability, a Brief Introduction

© Copyright 2008, 2012 Herbert J. Bernstein

This is a brief introduction to the concepts of probability. Sometimes we are certain of the outcome of an activity, but often, as in the toss of a coin, the spin of a routlette wheel or the dealing of hands from a shuffled card deck, we are not certain. We may think that some outcomes are more likely than others, or all outcomes may be equally likely.

Consider a coin. Unless we do something special to weight one side of the coin more than the other, when we toss it, we expect that there is an equal chance that the coin will land heads up or tails up.


images from http://www.busyteacherscafe.com/Coin%20Clipart.htm

Suppose we toss a coin n times. Each toss is an independent event, i.e. the outcomes of later tosses do not depend on the outcome of any one such toss. In the long run we expect that half of the tosses will have an outcome of heads and that half the tosses will have an outcome of tails. That is, given a sufficiently large number, n, of trial tosses, we expect the ratio, h/n of (favorable) heads, h, outcomes to the total number of tosses to be close to 1/2, and as n increases to have this ratio tend to come closer to 1/2. This does not mean that any given run of tosses will come up half heads and half tails. It is possible to toss a fair coin ten times and get ten tails in a row or ten heads in a row. However, in the long run, we expect the number of heads and tails to even out.

As a generalization, we say that the probability,P(Ai), of some outcome, Ai, called a "favorable outcome", in a finite set, U = {A1, . . . , Am }, of all possible disjoint outcomes, is the limit of the ratios

Pn =     Number of favorable outcomes in n trials

Number of trials, n

as the number of trials, n, goes to infinity. The individual Pn are called empirical probabilities, and serve as estimators of the actual probability.

If we consider either outcome Ai or outcome Aj for disjoint outcomes to both be favorable, then

P(Ai or Aj) = P(Ai) + P(Aj)

and the probability, P(S), of any subset S of U is the sum of the probabilities in S, which sum is a number between 0 and 1. Further P(not S) = 1 - P(S).

Given two sets S and T, write the union of S and T, consisting of the outcomes in either S or T, as ST, and the intersection of S and T, consisting of outcomes in both S and T, as ST. Clearly, by counting and discarding duplicates, the probability of the union is

P(ST) = P(S) + P(T) - P(ST).

Now, if S and T are (stochastically) independent of each other, that is outcomes in S occur regardless of whether or not outcomes in T occur, and vice-versa, then

P(ST) = P(S) * P(T)

The probability, P(S|T), of S given T, is P(ST)/P(T), which says that

P(S|T) = P(S), if S and T are independent.

For many problems, we can compute the probability of a favorable outcome by counting all the ways in which that favorable outcome can be achieved and dividing by the total number of ways in which all outcomes can be achieved. Consider 2 cubical dice having faces with 1, 2, 3, 4, 5 or 6 dots on them. There are 6 ways in which the first die can land and 6 ways in which the second die can land, making 36 possible outcomes. Each of these 36 possibilities has a proability of 1/36. If we consider a total of 7 to be a favorable outcome, we can compute the probability of such a result as

P(two dice add up to seven) = P(die1 = 1 and die2 = 6) +
P(die1 = 2 and die2 = 5) +
P(die1 = 3 and die2 = 4) +
P(die1 = 4 and die2 = 3) +
P(die1 = 5 and die2 = 2) +
P(die1 = 6 and die2 = 1)  

for a total probability of 6/36 = 1/6. See wizardofodds.com/gambling/dice.html for more information on dice probabilities.

If a numeric value v(Ai) is associated with each possible outcome Ai, then the expected value of v on S, E(v|S), for a set of outcomes, S, is the weighted sum of v(Ai)*(Ai|S).

Consider a lottery game with 48 possible numbers, from which you need to hold 6 numbers matching the winning selection of 6 distinct numbers in order to win. There are 48 possibilities for the first number, leaving 47 possibilities for the second number, 46 possibilities for the third number, 45 possibilities for the fourth number, 44 possibilities for the fifth number and 43 possibilities for the last number. Thus there are 48*47*46*45*44*43 = 8,835,488,640 possible outcomes. In order to win, it is acceptable for any of the 6 winning numbers to be in the first position, any of the 5 remaining numbers to be in the second position, any of the 4 remaining numbers to be in the third position, any of the 3 remaining numbers to be in the fourth position, any of the 2 remaining numbers to be in the fifth position, but only one remaining number in the last position, giving 6*5*4*3*2*1 = 720 favorable outcomes, for a probability of wining of 720/8,835,488,640 = 1/12,271,512. Thus the expected value of a $1,000,000 invested in lottery tickets is $1,000,000/12,271,512, or slightly more than 8 cents. See www.thelotterysite.com/lottery_odds.htm for additional detail on the probability of winning a lottery.

If you return to wizardofodds.com/gambling/dice.html you will see that the "field bet" in craps has an expected value to the player is 2.78% less than they bet, because, even though there are 7 outcomes for which the player wins and only 4 for which the house wins, the sum of the probabilities for the player is slightly less than the sum of the proabilities for the house.

Let is look at a few more games. Consider the numbers game. In the numbers game, the player is a 3 (000 -- 999) or 4 digit number (0000 -- 9999). In the first case there are 1000 equally likely possible outcomes, s the probability of wining is 1/1000 and in the second case there are 10,000 equally likely possible outcomes, so the probability of winning is 1/10,000. If the 3-digit numbers game pays you $500, as in New York, then the expected return on each $1 is only $.50. If the 4-digit numbers game, called WIN 4 in New York, pays you $5000, then the expected return on each $1 is still only $.50.

It is important to realize that each of the organizations that run gambling enterprises, whether it is a casino, a state, or you local religious organization, uses games for which the expected return for the player is negative. This does not mean that you can never win, but it means that if you keep on playing, then over the long run you will lose.

Many people enjoy card games. Winning them requires an understanding of the probabilities of various hands. You can find a table of the probabilities of various poker hands at http://en.wikipedia.org/wiki/Poker_probability. Let is examine one case, the chances of a pair of matching cards in a hand of 5 cards from a single deck of 52 cards of 4 suits, each suit with 13 cards. If we cared about the order of cards in our hand, the total number of possible hands would be 52*51*50*49*48, because any of the 52 cards could be the first card, leaving 51 possibilities for the second card, etc., for 311,875,200 possible ordered hands. However, in poker, we don't care about the order of the cards in our hand, so for any given hand, we would be just has happy with any of the hands that puts any of the 5 cards we were dealt in the first place, then any of the 4 remaining cards we were dealt in the second place, etc., so that we could group the possible ordered hands into sets of 5*4*3*2 = 120 hands each, giving 2,598,960 possible order-indendent hands. In order to have a pair, we have to hae two cards of the same numeric value. There are 13 possibilities. The first card has to come from one of the 4 possible suits and the second from one of the other 3 suits, for a total on 4*3 = 12 order-dependent possibiities, but an ace of spades with an ace of clubs is the same pair as an ace of clubs with an ace of spades, so we group the 12 order-dependent pairs of a given number value into 6 order-indendent pairs. That leaves 3 more cards in the order-independent hand. In order not to make our pair into 3 or 4 of a kind, we accept as a pair only hands in which those three cards are drawn only from the cards not having the same numeric value, and once we pick a value, we are not allowed to repeat it, giving us 4*12*4*11*4*10 possible order-dependent partial hands of 3 cards, which we can shuffle 3*2*1 ways, so that the total number of order-independent pair hands is 13*(4*3/2)*(4*12*4*11*4*10/6) =1,098,240 possible order-independent pair hands out of 2,598,960 or 42.3%

Probabilty Distributions

Some random experiments produce numeric values as their outcomes. We call such random numeric values, "random variables". Random variables may be:

For a finite discrete distribution of N values, x1, ..., xN, we define the "mean" or "population mean", μ, and "variance" or "population variance", σ2 as

μ = N
Σ
k=1
xkP(xk)

σ2 = N
Σ
k=1
(xk-μ)2P(xk)

The square root of the variance is called the "standard deviation". A "Z-score" is

Z(x) =x - μ
--------------
σ

For a continuous real random variable, we cannot usually cannot say anything useful about the probability of the outcome being a particular value. That would in general always give us zero. Instead we consider

P(a ≤ x ≤ b)

the probability of the outcomes lying in an particular interval [a,b]. The probability density function, ρ(x), of P is a function with the property that

P(a ≤ x ≤ b) = the area under &rho(x) for x between a and b.

ρ(x) is never negative and the total area under all of ρ is always 1.

There are two very important real probability density functions, the Gaussian distribution (or normal distribution) and the continuous Poisson distribution (or exponential distribution). When we take many samples from any distribution the resulting combined distribution of the sum or average comes increasingly close to a normal distribution as the number of samples increases. Formally the Central Limit Theorem says that the average of n samples from a distribution with arbitrary densisty function of expected value μ and standard deviation σ approximates a normal distribution of expected value μ and standard deviation σ/n.

The Poisson distribution is seen in models of the arrival time of independent events, such as radioactive decay or wear and tear failures. It is the basis for the saying that "trouble comes in threes" in the sense that multiple events from Poisson distributions tend to come in clumps.

The probability density function of the Gaussian distribution is

ρ(x) =   1
--------------
σ√(2 π)
exp ( -(x-μ)2
--------------
2 σ2
)
which produces a bell-shaped curve centered on μ with "standard deviation" σ

The probability density function of the Poisson distribution is

ρ(x) = λ exp(-λ x), x ≥ 0

with mean and variance λ.