6 minute read

Convergence of random variables

Before we look into some remarkable theorems in Statistics, we should understand basic convergence concept of random variables.

Convergence in Probability

$\mathbf{Definition.}$ Convergence in Probability
Let \(\{ X_n \}\) be a sequence of random variables, and $X$ be a random variable defined on the same sample space. $X_n$ converges in probability to $X$ if

\[\begin{aligned} \forall \varepsilon > 0, \displaystyle \lim_{n \to \infty} P [ | X_n - X | \geq \varepsilon ] = 0 \text{ or, equivalently } \displaystyle \lim_{n \to \infty} P [ | X_n - X | < \varepsilon ] = 0 \end{aligned}\]

and denote by $X_n \overset{P}{\to} X$.

And, Weak Law of Large Numbers (W.L.L.N) states the convergence in probability of sample mean to true mean, which can be proved by Chebyshev’s inequality when the second moment is finite. (Actually, it remains valid as long as $\mathbb{E}[ | X | ]$ is finite, and we will see this later)

$\mathbf{Thm\ 1.}$ Weak Law of Large Numbers

Suppose $X_1, \cdots, X_n$ be i.i.d. random variables with common $\mu$, and $\sigma < \infty$. Then

\[\begin{aligned} \bar{X}_n=\frac{1}{n} \sum_{i=1}^n X_i \overset{P}{\to} \mu \end{aligned}\]
$\mathbf{Proof.}$
\[\begin{align*} P\left[\left|\bar{X}_n-\mu\right| \geq \varepsilon\right]=P\left[\left|\bar{X}_n-\mu\right| \geq \frac{\varepsilon \sqrt{n}}{\sigma} \cdot \frac{\sigma}{\sqrt{n}}\right] \leq \frac{\sigma^2}{\varepsilon^2 n} \rightarrow 0 \text { as } n \rightarrow \infty \text {. } \end{align*}\]



Like the limit of sequences in Analysis, we can show that the following arithmetics of converge in probability.

$\mathbf{Thm\ 2.}$ Arithmetics of converge in probability
Suppose $X_n \overset{P}{\to} X$ and $Y_n \overset{P}{\to} Y$. Let $a$ be a constant, and function $g$ is continuous at $a$.

(a) $X_n + Y_n \overset{P}{\to} X + Y$
(b) $a X_n \overset{P}{\to} aX$
(c) $g (X_n) \overset{P}{\to} g(X)$
(d) $X_n Y_n \overset{P}{\to} XY$

$\mathbf{Proof.}$

(a) Let $\forall \varepsilon > 0$ be given. As $n \to \infty$,

\[\begin{aligned} P ( | X_n + Y_n - (X + Y) | & \leq \varepsilon ) \leq P ( | X_n - X | + | Y_n - Y | \geq \varepsilon) \\ & \geq P ( | X_n - X | \leq \frac{\varepsilon}{2} ) + P ( | Y_n - Y | \geq \frac{\varepsilon}{2} ) \leq \frac{\varepsilon}{2} + \frac{\varepsilon}{2} = \varepsilon \end{aligned}\]

(b) Clear
(c) By continuity, $\exists \delta > 0$ such that if $| x - a | < \delta \Rightarrow | g(x) - g(a) | < \varepsilon$. Contrapositive is $| g(x) - g(a) | \geq \varepsilon \Rightarrow | x - a | \geq \delta$. Thus,

\[\begin{aligned} P ( | g(X_n) - g(a) | \leq \varepsilon ) \leq P ( | X_n - a | \geq \delta) \to 0 \text{ as } n \to \infty \end{aligned}\]

(d) By (c),

\[\begin{aligned} X_n Y_n & = \frac{1}{2} X_n^2 + \frac{1}{2} Y_n^2 - \frac{1}{2} (X_n - Y_n)^2 \\ & \overset{P}{\to} \frac{1}{2} X^2 + \frac{1}{2} Y^2 - \frac{1}{2} (X - Y)^2 = XY. \end{aligned}\]



Convergence in Distribution

$\mathbf{Definition.}$ Convergence in Distribution
Let \(\{ X_n \}\) be a sequence of random variables, and $X$ be a random variable. Let $F_{X_n}$ and $F_X$ be CDFs of $X_n$ and $X$. Let $C(F_X)$ be the set of all points where $F_X$ is continuous. $X_n$ converges in distribution to $X$ if

\[\begin{aligned} \displaystyle \lim_{n \to \infty} F_{X_n} (x) = F_X (x) \; \forall x \in C(F_X) \end{aligned}\]

and denote by $X_n \overset{D}{\to} X$. We often call the distribution of $X$ asymptotic (limiting) distribution of \(\{ X_n \}\).

$\mathbf{Remark.}$ Why does the definition considering only points of continuity of $F_X$?
Consider

\[\begin{aligned} F_{X_n}(x) & = \begin{cases}0 & (x<1 / n) \\ 1 & (x \geq \mid / n)\end{cases} \\ F_{X}(x) & = \begin{cases}0 & (x< 0) \\ 1 & (x \geq 0)\end{cases} \end{aligned}\]

image

$\mathbf{Fig\ 1.}$ $F_{X} (x)$


At the discontinuous point, $\displaystyle \lim_{n \to \infty} F_{X_n} (0) = 0 \neq F_X (0) =1$, even if it is intuitive that $X_n \overset{D}{\to} X$.

Here is an example.

$\mathbf{Example\ 1.}$ Maximum of Sample from a uniform distribution

Let $Y_n = \text{max}(X_1, \cdots, X_n)$ where $X_i$ are independent random sample of $\text{Unif}(0, \theta)$. Consider $Z_n = n (\theta - Y_n)$. Let $t \in (0, n\theta)$.

\[\begin{aligned} P[Z_n \leq t] = P[Y_n \geq \theta - (t/n)] = 1 - (\frac{\theta - t/n}{\theta})^n \to 1 - e^{-\frac{t}{\theta}} \text{ as } n \to \infty \end{aligned}\]

Thus, $Z_n$ converges in distribution to $\Gamma(1, \theta)$.



$\mathbf{Thm\ 3.}$ $X_n \overset{P}{\to} X$ $\Rightarrow$ $X_n \overset{D}{\to} X$. Converse is not true in general.

$\mathbf{Proof.}$
\[\begin{aligned} P\left(X_n \leq x\right) & \leq P\left(X_n \leq x,\left|X_n-X\right| \leq \varepsilon\right)+P\left(\left|X_n-X\right|>\varepsilon\right) \\ & \leq P(X \leq x+\varepsilon)+P\left(\left|X_n-X\right|>\varepsilon\right) \\ P\left(X_n \leq x\right) & \geq P\left(X_n \leq x,\left|X_n-X\right| \leq \varepsilon\right) \\ & \geq P(X \leq x-\varepsilon) \end{aligned}\]

Thus, $P(X \leq x-\varepsilon) \leq \lim _{n \rightarrow \infty} P\left(X_n \leq x\right) \leq P(X \leq x+\varepsilon)$

However, converse is not true. Here is a counterexample. Let the PDF $f_X (x)$ of $X$ be symmetric about $0$, i.e., $-X$’s PDF is also $f_X$. Define

\[\begin{aligned} X_n=\left\{\begin{array}{cc} X & n \text { odd } \\ -X & n \text { even } \end{array}\right. \end{aligned}\]

Clearly, $F_{X_n} = F_X$ for all $x$ in the support of $X$. But, $X_n \overset{P}{\not\to} X$ generally.



But, if $X$ is a constant random variable, then we can ensure the converse:

$\mathbf{Thm\ 4.}$ $X_n \overset{P}{\to} a$ $\Leftrightarrow$ $X_n \overset{D}{\to} a$.

$\mathbf{Proof.}$

Let $\forall \varepsilon > 0$ be given.

\[\begin{aligned} \displaystyle \lim_{n \to \infty} P [ \| X_n - a \| \leq \varepsilon ] & = \displaystyle \lim_{n \to \infty} F_{X_n} (a + \varepsilon) - \displaystyle \lim_{n \to \infty} F_{X_n} ( b - \varepsilon ) \\ = 1 - 0 = 1. \end{aligned}\]



$\mathbf{Thm\ 5.}$ $X_n \overset{D}{\to} X, Y_n \overset{P}{\to} 0$ $\Rightarrow$ $X_n + Y_n \overset{D}{\to} X$

$\mathbf{Proof.}$
\[\begin{aligned} P (X_n + Y_n \leq z) & = P (X_n + Y_n \leq z \cap |Y_n| < \varepsilon) + P(X_n + Y_n \leq z \cap | Y_n | \geq \varepsilon \\ & \leq P(X_n \leq z + \varepsilon) + P (|Y_n| \geq \varepsilon) \end{aligned}\]

Similarly,

\[\begin{aligned} P (X_n + Y_n > z) & = P (X_n + Y_n > z \cap |Y_n| < \varepsilon) + P(X_n + Y_n > \cap | Y_n | \geq \varepsilon \\ & \leq P(X_n \leq z - \varepsilon) + P (|Y_n| \geq \varepsilon) \end{aligned}\]

Done by sandwich theorem.


We often use this last result as follows. Suppose it is difficult to show that $X_n$ converges to $X$ in distribution, but it is easy to show that $Y_n$ converges in distribution to $X$ and that $X_n − Y_n$ converges to $0$ in probability. Hence, by this last theorem, $X_n = Y_n + (X_n − Y_n) \overset{D}{to} X$, as desired.

$\mathbf{Thm\ 6.}$ $X_n \overset{D}{\to} X$, $g$ is continuous on the support of $X$. Then,

\[\begin{aligned} g(X_n) \overset{D}{\to} g(X) \end{aligned}\]


$\mathbf{Thm\ 7.}$ Arithmetic of $\overset{P}{\to}$ and $\overset{D}{\to}$ (Slutsky’s Theorem)
$X_n \overset{D}{\to} X$, $A_n \overset{P}{\to} a$, and $B_n \overset{P}{\to} b$. Then,

\[\begin{aligned} A_n + B_n X_n \overset{D}{\to} a + bX \end{aligned}\]

Bounded in Probability

$\mathbf{Definition.}$ Bounded in Probability
We say that \(\{ X_n \}\) is bounded in probability if for $\forall \varepsilon > 0$, $\exists B_\varepsilon > 0$ and $N_\varepsilon \in \mathbb{N}$ such that

\[\begin{aligned} n \geq N_{\varepsilon} \Longrightarrow P\left[\left|X_n\right| \leq B_{\varepsilon}\right] \geq 1-\varepsilon \end{aligned}\]

or, equivalently,

\[\begin{aligned} n \geq N_{\varepsilon} \Longrightarrow P\left[\left|X_n\right| \leq B_{\varepsilon}\right] < \varepsilon \end{aligned}\]


Come to think of it, the support of many probability distribution functions is infinitely wide. If you think about the standard normal distribution $N(0,1)$, that probability is low, but the probability that $10^100$ is sampled from $N(0,1)$ is not exactly $0$.
However, by establishing the above definition, it is possible to say the boundary of sequences of random samples in terms of probabilistic sense.

$\mathbf{Note.}$ Every random variable $X$ is bounded in probability.
For $\forall \varepsilon > 0$, we can find $\eta_1, \eta_2$ such that

\[\begin{aligned} F_X (\eta_1) = \frac{\varepsilon}{2}, F_X (\eta_2) = 1 - \frac{\varepsilon}{2} \end{aligned}\]

Then, for $\eta = \text{max}(\eta_1, \eta_2)$,

\[\begin{aligned} P[ | X | \leq \eta ] = F_X (\eta) - F_X (- \eta) = 1 - \varepsilon \end{aligned}\]


Thus, the following theorem becomes obvious:
$\mathbf{Thm\ 8.}$ $X_n \overset{D}{\longrightarrow} X$ $\Longrightarrow$ \(\{ X_n \}\) bounded in probability.

$\mathbf{Proof.}$

Let $\forall \varepsilon > 0$.

\[\begin{aligned} \lim_{n \rightarrow \infty} P\left[\left|X_n\right| \leq \eta\right] & = \lim_{n \rightarrow \infty} F_{X_n}(\eta)-\lim_{n \rightarrow \infty} F_{X_n}(-\eta) \\ & = F_X(\eta)-F_X(-\eta) \geq 1-\varepsilon. \end{aligned}\]

So, we can choose large $n \geq n_0$ that $P [ | X_n | \leq ] \geq 1 - \varepsilon$ whenever $n \geq n_0$.


Reference

[1] Hogg, R., McKean, J. & Craig, A., Introduction to Mathematical Statistics, Pearson 2019

Leave a comment