Combinations

Let Ω be a set with n elements; we want to count the number of distinct subsets of the set Ω that have exactly k (k≤n) elements. It is assumed that the order of selections does not matter. The empty set and the set Ω are considered to be subsets of Ω. The empty set is usually denoted by ∅.

Example: Let \( \Omega = \left\{ a, b, c \right\} \) be a set with three distinct elements, labeled by a, b, and c. The subsets of Ω are

\[ \varnothing , \ \{ a \} , \ \{ b \} , \ \{ c \} , \ \{ a, b \} , \ \{ a, c \} , \ \{ b, c \} , \ \{ a, b, c \} . \]

So there are three sets with two elements out of eight subsets. ■

The number of distinct subsets with k elements that can be chosen from a set with n elements is denoted by \( \binom{n}{k} , \) and is pronounced “n choose k.” The number \( \binom{n}{k} \) is called a binomial coefficient.

The binomial coefficients are calculated according to the formula:

\[ \binom{n}{k} = \frac{n^{\underline{k}}}{k!} = \frac{n \cdot (n-1) \cdot (n-k) \cdot \cdots \cdot (n-k+1)}{1\cdot 2 \cdot 3 \cdot \cdots \cdot k} , \]

where \( \displaystyle n^{\underline{k}} \) is k-th falling factorial and k! is a regular factorial. Note that the binomial coefficient as well as the falling factorial are defined for arbitrary real number n, but the lower index is always assumed to be an integer. In this case, it is always assumed that n ≥ k; otherwise the binomial coefficient is zero. Also, for negative integers k, the binomial coefficient is also zero. There are two simple formulas:

\[ \binom{n}{0} =1 \qquad\mbox{and} \qquad \binom{n}{n} =1 . \]

R has a dedicated comamnd choose(n,k) for the binomial coefficient:

Example: In the above example, there is one subset with no elements, three subsets with exactly 1 element, three subsets with exactly 2 elements, and one subset with exactly 3 elements. Therefore,

\[ \binom{3}{0} =1, \quad \binom{3}{1} =\binom{3}{2} = 3, \quad \binom{3}{3} = 1. \]

When we add all these binomial coefficients, we get

\[ \binom{3}{0} + \binom{3}{1} + \binom{3}{2} + \binom{3}{3} = 1 + 3 + 3 + 1 = 8 = 2^3 . \]

■

Theorem: (Recurrence) For integers n and k, we have the first order recurrence relation:

\[ \binom{n}{k} = \binom{n}{k-1} + \binom{n-1}{k-1} . \]

We wish to choose a subset of k elements. Choose an element u of Ω. Assume first that we do not want u in the subset. Then we must choose the k elements from a set of n-1 elements; this can be done in \( \binom{n-1}{k} \) ways. On the other hand, assume that we do want u in the subset. Then we must choose the other k-1 elements from the remaining n-1 elements of Ω; this can be done in \( \binom{n-1}{k-1} \) ways. Since u is either in our subset or not, the number of ways that we can choose a subset of k elements is the sum of the number of subsets of k elements which have u as a member and the number which do not---this is what the above recurrence states.

The recurrence relation together with the boundary conditions determines completely the numbers \( \binom{n}{k} . \) We can use these relations to determine the famous triangle of Pascal (or Pascal's triangle), which exhibits all these numbers in matrix form:

Pascal's triangle.
	k=0	1	2	3	4	5	6	7	8	9	10
n=0	1
1	1	1
2	1	2	1
3	1	3	3	1
4	1	4	5	4	1
5	1	5	10	10	5	1
6	1	6	15	20	15	6	1
7	1	7	21	35	35	21	7	1
8	1	8	28	56	70	56	28	8	1
9	1	9	36	84	126	126	84	36	9	1
10	1	10	45	120	210	252	210	120	45	10	1

The following R script shows the entries in Pascal triangle.

R code version of choose() [simplistic; warning for k < 0]:

The binomial coefficients are used to calculate arbitrary power function:

\[ \left( 1 + x \right)^n = 1 + n\,x + \binom{n}{2} x^2 + \cdots + x^n = \sum_{k\ge 0} \binom{n}{k} x^k , \]

where n is any real number. Upon trankating the series, we obtain an approximation to a power value. For example, the square root can be approximated as

Example: Poker players sometimes wonder why a four of a kind beats a full house. A poker hand is a random subset of 5 elements from a deck of 52 cards. A hand has four of a kind if it has four cards with the same value—for example, four sixes or four kings. It is a full house if it has three of one value and two of a second—for example, three twos and two queens. Let us see which hand is more likely. How many hands have four of a kind? There are 13 ways that we can specify the value for the four cards. For each of these, there are 48 possibilities for the fifth card. Thus, the number of four-of-a-kind hands is 13*48 = 624. Since the total number of possible hands is \( \binom{52}{5} = 2598960, \) the probability of a hand with four of a kind is 624/2598960 = 0.00024.

Now consider the case of a full house; how many such hands are there? There are 13 choices for the value which occurs three times; for each of these there are \( \binom{4}{3} = 4 \) choices for the particular three cards of this value that are in the hand. Having picked these three cards, there are 12 possibilities for the value which occurs twice; for each of these there are \( \binom{4}{2} = 6 \) possibilities for the particular pair of this value. Thus, the number of full houses is 13*4*12*6 = 3744, and the probability of obtaining a hand with a full house is 3744/2598960 = 0.0014. Thus, while both types of hands are unlikely, you are six times more likely to obtain a full house than four of a kind.

Theorem: (Basic principle of counting) The total number of possible outcomes of a list of decisions (making selections from various categories) is equal to the product of the numbers of choices for each decision (or category). ▣

Example: If you purchase three pairs of jeans, five shirts, and four pairs of shoes, how many new outfits (consisting of a new pair of jeans, a new shirt, and a new pair of shoes) would you have ?

Because there are three categories, selecting an outfit requires a list of three decisions: you must select one pair of jeans, one shirt, and one pair of shoes. Therefore, you have three categories to choose from: jeans, shirts, and shoes. The order in which the decisions are made does not effect the overall outfit.

Our first decision (jeans) has three choices. The second decision is to select a shirt out of five shirts. Our third decision is to select a pair of shoes, for which there are four choices.

Hence, the total number of outfits could have been obtained by multiplying the number of choices for each decision:

\[ 3 \cdot 5 \cdot 4 = 60 \qquad\mbox{outfits}. \]

■

Basics

Probability

Random Variables

Multivariable Distributions

Discrete Random Variables

Continuous Random Variables

Sampling Distributions

Statistical Inference I: Classical Methods

Statistical Inference II: Bayesian Inference

Intro to Random Processes

Some Important Random Processes

Intro to Simulation Using MATLAB

Intro to Simulation Using R

Glossary

Reference

Statistics