[Statistics] Binomial and Related Distributions

5 minute read

Bernoulli distribution

$\mathbf{Definition.}$ (Bernoulli trials). Binary experiments that are independent with constant success probability are called Bernoulli trials.

With this trial, a random variable $X$ associated with a Bernoulli trial can be defined as

\[\begin{align*} X = \begin{cases} 1 & \text{ (success) } \\ 0 & \text{ (fail) } \end{cases} \Rightarrow X \sim \text{Bernoulli}(p): p(x) = p^x (1-p)^{1-x} \end{align*}\]

And we call it Bernoulli distribution.

Binomial distribution

One can repeat Bernoulli trial in multiple times, like flipping coin $n$ times. If we define a random variable $X$ that represents the number of $n$ Bernoulli trials with probability $p$, it follows binomial distribution:

\[\begin{align*} X \sim B(n, p) \end{align*}\]

$\mathbf{Example.}$

Let the independent random variables $X_1, X_2, X_3$ have the same cdf $F(x)$. Let $Y$ be thte middle value of $X_1, X_2, X_3$. Then, the cdf of $Y$ is

\[\begin{align*} P(Y \leq x) = P(W \geq 2) = \binom{3}{2} F(x)^2 (1 - F(x)) + \binom{3}{3} F(x)^3 \end{align*}\]

where $W$ is the number of $X_i$ such that $X_i \leq x \to W \sim \text{B}(3, F(x))$.

PMF

\[\begin{align*} p(x) = \binom{n}{x} p^x (1 - p)^{n - x} \; (x = 1, \cdots, n) \end{align*}\]

MGF

\[\begin{align*} M(t) = \mathbb{E}(e^{tX}) = \sum_{x=1}^n \binom{n}{x} (e^t p)^x (1 - p)^{n-x} = (1 + e^t p - p)^n \end{align*}\]

Mean, Variance

\[\begin{align*} \mathbb{E}[X] &= np \\ \text{Var}[X] &= np(1-p) \end{align*}\]

$\mathbf{Proof.}$

\[\begin{align*} M'(t) &= n e^t p (1 + e^t p - p)^{n-1} \\ M''(t) &= n (n-1) e^{2t} p^2 (1 + e^t p - p)^{n-2} + n e^t p (1 + e^t p - p)^{n-1} \end{align*}\]

$\mathbf{Thm\ 1.1.}$ Let $X_1, \cdots, X_m$ be independent random variables such that $X_i \sim B(n_i, p)$. Then,

\[\begin{align*} Y = \sum_{i=1}^m X_i \sim B(\sum_{i=1}^m n_i, p) \end{align*}\]

$\mathbf{Proof.}$

\[\begin{align*} M_i (t) &= (1 + e^t p - p)^{n_i} \\ M (t) &= \mathbb{E}[e^{tY}] = \prod_{i=1}^m \mathbb{E}[e^{tX_i}] = \prod_{i=1}^m M_i (t)._\blacksquare \end{align*}\]

Negative Binomial, Geometric distribution

Consider the number of failures until $r$-th success in Bernoulli trials. Then, it defines another interesting distribution called negative binomial distribution. The distribution of the special case of random variable that $r = 1$ is also called geometric distribution.

As we will see, the meaning of “negative” comes from negative binomial series

\[\begin{align*} (x + a)^{-n} = \sum_{k=0}^\infty \binom{-n}{k} x^k a^{-n - k} \end{align*}\]

where

\[\begin{align*} \binom{-n}{k} &= \frac{-n (-n-1) \cdots (-n - k + 1)}{k!} \\ &= (-1)^k \cdot \binom{n + k - 1}{k} \end{align*}\]

Thus, one may see that

\[\begin{align*} p^{-r} = \sum_{k=0}^\infty \binom{r+k-1}{k} (1 - p)^k \end{align*}\]

PMF

\[\begin{align*} p(y) = \binom{r+y-1}{y} (1 - p)^y p^r \; (y = 0, 1, 2, \cdots) \end{align*}\]

MGF

\[\begin{align*} M(t) = (1 - e^t (1 - p))^{-r} p^r \end{align*}\]

$\mathbf{Proof.}$

\[\begin{align*} M(t) &= \mathbb{E}[e^{tY}] \\ &= \sum_{y = 1}^\infty (e^t (1 - p))^y p^r \\ &= (e^t (1 - p))^{-r} p^r \end{align*}\]

Mean, Variance

\[\begin{align*} \mathbb{E}[Y] &= \frac{r(1 - p)}{p} \\ \text{Var}(Y) &= \frac{r^2 (1 - p)}{p^2} \end{align*}\]

Multinomial distribution

Multinomial distribution is a generalization of Binomial; we extend binary experiments of Bernoulli process to $k$-categorical experiments. For the number of items in each $k$-category, $X_1, \cdots, X_{k-1}$, the joint distribution of them represents Multinomial distribution. Notice that since total number of experiments is fixed, one of the number of items of each category is determined by the others $k-1$.

PMF

\[\begin{align*} p(x_1, \cdots, x_{k-1}) = \binom{n!}{x_1 ! \cdots x_k !} p_1^{x_1} \cdots p_k^{x_k} \text{ where } x_k = n - \sum_{i=1}^{k-1} x_i, p_k = 1 - \sum_{i=1}^{k-1} p_i \end{align*}\]

MGF

\[\begin{align*} M(t_1, \cdots, t_{k-1}) &= \underset{x_1 + \cdots + x_k = n}{\sum \cdots \sum} \binom{n!}{x_1 ! \cdots x_k !} (e^{t_1} p_1)^{x_1} \cdots (e^{t_{k-1}} p_{k-1})^{x_{k-1}} p_k^{x_k} \\ &= (e^{t_1} p_1 + \cdots + e^{t_{k-1}} p_{k-1} + p_k)^n \end{align*}\]

Mean, Variance, Covariance

\[\begin{align*} \mathbb{E}[X_i] &= n p_i \\ \text{Var}(X_i) &= n p_i (1 - p_i) \\ \text{Cov}(X_i, X_j) &= -n p_i p_j \end{align*}\]

$\mathbf{Proof.}$

\[\begin{align*} &\frac{\partial}{\partial t_i} M (t_1, \cdots, t_{k-1}) = ne^{t_i} p_i (e^{t_1} p_1 + \cdots + e^{t_{k-1}} p_{k-1} + p_k)^{n-1} \\ &\frac{\partial^2}{\partial t_i^2} M (t_1, \cdots, t_{k-1}) = ne^{t_i} p_i (e^{t_1} p_1 + \cdots + e^{t_{k-1}} p_{k-1} + p_k)^{n-1} + n(n-1) e^{2 t_i} p_i^2 (e^{t_1} p_1 + \cdots + e^{t_{k-1}} p_{k-1} + p_k)^{n-2} \\ &\frac{\partial}{\partial t_i} \frac{\partial}{\partial t_j} M (t_1, \cdots, t_{k-1}) = n (n-1) e^{t_i + t_j} p_i p_j (e^{t_1} p_1 + \cdots + e^{t_{k-1}} p_{k-1} + p_k)^{n-1}._\blacksquare \end{align*}\]

Hypergeometric distribution

Suppose we have a lot of $N$ items whose $D \geq 1$ are defective. Let $X$ denote by the number of defective items in a sample size $n$. Then, a random variable $X$ follows a special distribution called Hypergeometric distribution.

The main difference between it and binomial distribution is, we sample items without replace.

PMF

\[\begin{align*} p(x) = \frac{\binom{N - D}{n - x} \binom{D}{x}}{\binom{N}{n}} \; (x = 1, \cdots, n) \end{align*}\]

Mean, Variance

\[\begin{align*} \mathbb{E}[X] &= n \frac{D}{N} \\ \text{Var}[X] &= n \frac{D}{N} \frac{ (N-D)(N-n) }{N (N-1)} \end{align*}\]

$\mathbf{Proof\ 1.}$

\[\begin{align*} \mathbb{E}[X] &= \sum_{x=1}^n x \frac{\binom{N - D}{n - x} \binom{D}{x}}{\binom{N}{n}} \\ &= \frac{D}{\frac{N}{n}} \sum_{x=1}^n \frac{\binom{N - D}{n - x} \binom{D-1}{x-1}}{\binom{N-1}{n-1}} \\ &= \frac{D}{\frac{N}{n}} \sum_{x=1}^n \frac{\binom{N - 1 - (D - 1)}{(n - 1) - (x - 1)} \binom{D-1}{x-1}}{\binom{N-1}{n-1}} \\ &= n \frac{D}{N} \end{align*}\]

\[\begin{align*} \mathbb{E}[X(X-1)] &= \sum_{x=1}^n x (x - 1) \frac{\binom{N - D}{n - x} \binom{D}{x}}{\binom{N}{n}} \\ &= \frac{D (D - 1)}{\frac{N (N-1)}{n (n-1)}} \sum_{x=1}^n \frac{\binom{N - D}{n - x} \binom{D-2}{x-2}}{\binom{N-2}{n-2}} \\ &= \frac{D (D - 1)}{\frac{N (N-1)}{n (n-1)}} \sum_{x=1}^n \frac{\binom{N - 2 - (D - 2)}{(n-2) - (x-2)} \binom{D-2}{x-2}}{\binom{N-2}{n-2}} \\ &= \frac{D (D - 1)}{\frac{N (N-1)}{n (n-1)}}._\blacksquare \end{align*}\]

$\mathbf{Proof\ 2.}$

Let

\[\begin{align*} X_i = \begin{cases} 1 & \text{ if i-th random variable is defective } \\ 0 & \text{ otherwise } \end{cases} \Rightarrow X = \sum_{i=1}^n X_i \end{align*}\]

Note that $P(X_i) = \frac{D}{N}$ for all $i = 1, \cdots, n$. (As it is not conditional probability, we do not need to consider other $X_i$.) Thus, $\mathbb{E}[X] = n \frac{D}{N}$. For variance,

\[\begin{align*} \text{Var}[X] &= \text{Var}[X_1 + \cdots + X_n] \\ &= \sum_{i=1}^n \text{Var}[X_i] + 2 \sum_{i < j} \text{Cov}(X_i, X_j) \\ &= \sum_{i=1}^n \frac{D(N - D)}{N^2} + 2 \sum_{i < j} (\mathbb{E}[X_i X_j] - \frac{D^2}{N^2}) \\ &= n \frac{D(N - D)}{N^2} + 2 \sum_{i < j} \frac{D}{N} \frac{D-N}{N(N-1)} \\ &= n \frac{D(N - D)}{N^2} - n (n - 1) \frac{D}{N} \frac{N-D}{N(N-1)} \\ &= n \frac{D}{N} \frac{ (N-D)(N-n) }{N (N-1)} \end{align*}\]

\[\begin{align*} \mathbb{E}[X_i X_j] &= P(X_i = 1, X_j = 1) \\ &= P(X_j = 1 | X_i = 1) P (X_i = 1) \\ &= \frac{D-1}{N-1} \cdot \frac{D}{N}._\blacksquare \end{align*}\]

Reference

[1] Hogg, R., McKean, J. & Craig, A., Introduction to Mathematical Statistics, Pearson 2019
[2] Wikipedia, Binomial distribution
[3] Wikipedia, Multinomial distribution
[4] Wikipedia, Hypergeometric distribution

Share on

Twitter Facebook LinkedIn

Youngdo Lee

[Statistics] Binomial and Related Distributions

Bernoulli distribution

Binomial distribution

PMF

MGF

Mean, Variance

Negative Binomial, Geometric distribution

PMF

MGF

Mean, Variance

Multinomial distribution

PMF

MGF

Mean, Variance, Covariance

Hypergeometric distribution

PMF

Mean, Variance

Reference

Share on

Leave a comment

Youngdo Lee

Bernoulli distribution

Binomial distribution

PMF

MGF

Mean, Variance

Related Theorems

Negative Binomial, Geometric distribution

PMF

MGF

Mean, Variance

Multinomial distribution

PMF

MGF

Mean, Variance, Covariance

Hypergeometric distribution

PMF

Mean, Variance

Reference

Share on

Leave a comment