[Statistics] Multivariate Distributions - Part I
(This post is a summary of Chapter 2 of [1])
2.1 Distributions of Two Random Variables
In this section, we consider a pair of random variables $(X_1, X_2)$ which we call a random vector. We explore cumulative distribution functions and probability density functions, as before. We’ll also define and illustrate marginal distributions, expectation, and moment generating functions.
To introduce random vectors, consider the experiment of tossing a coin three times so that the sample space if
Let $X_1$ be the random variable that denotes the number of $H$’s on the first two tosses and let $X_2$ be the random variable that denotes the number of $H$’s on all three tosses.
We then consider the pair of random variables $(X_1, X_2)$. The sample space $\mathcal{D}$ for $(X_1, X_2)$ is
$\mathbf{Def\ 2.1.1}$ (Random Vector).
Given a random experiment with a sample space $\mathcal{C}$, consider two random variables $X_1$ and $X_2$, which assign to each element $c$ of $\mathcal{C}$ one and only one ordered pair of numbers $X_1 (c) = x_1$, $X_2 (c) = x_2$. Then we say that $(X_1, X_2)$ is a random vector. The space of $(X_1, X_2)$ is the set of ordered pairs \(\mathcal{D} = \{ (x_1, x_2) : x_1 = X_1 (c), x_2 = X_2 (c), c \in \mathcal{C} \}\).
$\mathbf{Definition.}$ (Joint cumulative distribution function).
Let $\mathcal{D}$ be the space associated with the random vectors $(X_1, X_2)$. For $A \subset \mathcal{D}$ we call $A$ an event. The cdf for $(X_1, X_2)$ is
for $(x_1, x_2) \in \mathbb{R}^2$. This is the joint cumulative distribution function of $(X_1, X_2)$.
$\mathbf{Note.}$ $P[a_1 < X_1 \leq b_1, a_2 < X_2 \leq b_2] = F_{X_1, X_2} (b_1, b_2) - F_{X_1, X_2} (a_1, b_2) - F_{X_1, X_2} (b_1, a_2) + F_{X_1, X_2} (a_1, a_2)$.
$\mathbf{Proof.}$
$\mathbf{Definition.}$ (Discrete random vector)
A random vector $(X_1, X_2)$ is a discrete random vector if its space $\mathcal{C}$ is finite or countable. (Hence, $X_1$ and $X_2$ are both discrete also.) The joint probability mass function of $(X_1, X_2)$ is defined by
for all $(x_1, x_2) \in \mathcal{D}$.
$\mathbf{Example\ 2.1.1.}$
$\mathbf{Definition.}$ (Continuous random vector).
If the cdf $F_{X_1, X_2}$ is continuous then random vector $(X_1, X_2)$ is said to be continuous. For the most part, the continuous random vectors in this book have cdfs that can be represented as integrals of nonnegative functions.
for all $(x_1, x_2) \in \mathbb{R}^2$. We call the integrand the joint probability density function of $(X_1, X_2)$.
$\mathbf{Note.}$ $\frac{\partial^2 F_{X_1, X_2} (x_1, x_2)}{\partial x_1 \partial x_2} = f_{X_1, X_2} (x_1, x_2)$, except possibly on events that have probability zero.
$\mathbf{Example\ 2.1.2., 2.1.3.}$
As in the 2 examples, we extend the definition of a pdf $f_{X_1, X_2} (x_1, x_2)$ over $\mathbb{R}^2$ by using $0$ elsewhere. We do this consistently so that tedious, repetitious references to the space $\mathcal{D}$ can be avoided. Once this is done, we replace
2.1.1 Marginal Distributions
Recall that the event which defined the cdf of $X_1$ at $x_1$ is \(\{ X_1 \leq x_1 \}\). But, \(\{ X_1 \leq x_1 \} = \{ X_1 \leq x_1 \} \cap \{ -\infty < X_2 < \infty \} = \{ X_1 \leq x_1 , -\infty < X_2 < \infty \}\), clearly.
Then, taking probabilities, we have
for all $x_1 \in \mathbb{R}$. By $\mathbf{Thm\ 1.3.6}$ in the previous post, we can write this equation as $F_{X_1} (x_1) = \displaystyle \lim_{x_2 \to \infty} F(x_1, x_2)$. Thus we have a relationship between the cdfs, which we can extend to either the pmf or pdf depending on whether $(X_1,X_2)$ is discrete or continuous.
(1) discrete case
By the uniqueness of cdfs, the quantity in braces must be the pmf of $X_1$ evaluated at $w_1$; that is,
for all \(x_1 \in \mathcal{D}_{X_1}\).
$\mathbf{Note.}$ In terms of a tabled joint pmf with rows comprised of $X_1$ support values and columns comprised of $X_2$ support values, this says that the distribution of $X_1$ can be obtained by the marginal sums of the rows. Likewise, the pmf of $X_2$ can be obtained by marginal sums of the columns. (And that’s why we call marginal distribution.)
(2) continuous case
$F_{X_1} (x_1) = \int_{-\infty}^{x_1} \int_{-\infty}^{\infty} f_{X_1, X_2} (w_1, x_2) dx_2 dw_1 = \int_{-\infty}^{x_1} { \int_{-\infty}^{\infty} f_{X_1, X_2} (w_1, x_2) dx_2 } dw_1$.
By the uniqueness of cdfs, the quantity in braces must be the pmf of $X_1$ evaluated at $w_1$; that is,
for all $x_1 \in \mathbf{D_{X_1}}$.
$\mathbf{Example\ 2.1.5.}$
2.1.2 Expectation
$\mathbf{Definition.}$ (Expectation). Let $(X_1, X_2)$ be a random vector and let $Y = g(X_1, X_2)$ for some real-valued function. If
\[\begin{aligned} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} |g(x_1, x_2)| f_{X_1, X_2} (x_1, x_2) dx_1 dx_2 < \infty \end{aligned}\]then the expectation of $Y$ is defined as
Similarly, if $(X_1, X_2)$ discrete and
\[\begin{aligned} E\sum_{x_2} \sum_{x_1} |g(x_1, x_2)| p_{X_1, X_2} (x_1, x_2) < \infty \end{aligned}\]then the expectation of $Y$ is defined as
Also note that the expected value of any function $g(X_2)$ of $X_2$ can be found:
\[\begin{aligned} \mathbb{E}[g(X_2)] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x_2) f_{X_1, X_2} (x_1, x_2) dx_1 dx_2 = \int_{-\infty}^{\infty} g(x_2) f_{X_2} (x_2) dx_1 dx_2. \end{aligned}\]Based on the linearity of sum and integral, the following fact is obvious:
$\mathbf{Thm\ 2.1.1.}$ Let $(X_1, X_2)$ be a random vector. Let $Y_1 = g_1(X_1, X_2)$ and $Y_2 = g_2 (X_1, X_2)$ be random variables whose expectations exist. Then for all real numbers $k_1$ and $k_2$,
$\mathbf{Def\ 2.1.2}$ (Moment Generating Function of a random vector). Let $\mathbf{X} = (X_1, X_2)$ be a random vector. If $\mathbb{E}(e^{t_1 X_1 + t_2 X_2})$ exists for $|t_1 | < h_1$ and $|t_2 | < h_2$, where $h_1$ and $h_2$ are positive, it is denoted by $M_{X_1, X_2} (t_1, t_2)$ and is called the moment generating function (mgf) of $\mathbf{X}$.
$\mathbf{Remark.}$ Let $\mathbf{t} = (t_1, t_2)$. Then we can write the mgf of $\mathbf{X}$ as $M_{X_1, X_2} (t) = \mathbb{E}[e^{\mathbf{t} \mathbf{X}}]$, so it is quite similar to the mgf of a random variable. Also, the mgfs of $X_1$ and $X_2$ are immediately seen to be $M_{X_1, X_2} (t_1, 0)$ and $M_{X_1,X_2} (0,t_2)$, respectively.
$\mathbf{Example\ 2.1.10.}$
We also need to define the expected value of the random vector itself, but this is not a new concept because it is defined in terms of componentwise expectation:
$\mathbf{Def\ 2.1.3}$ (Expected Value of a random vector). Let $\mathbf{X} = (X_1, X_2)$ be a random vector. Then the expected value of $\mathbf{X}$ exists if the expectations of $X_1$ and $X_2$ exist. If it exists, then the expected value is given by
2.2 Transformations: Bivariate Random Variables
Let $(X_1, X_2)$ be a random vector. Suppose we know the joint distribution of $(X_1,X_2)$ and we seek the distribution of a transformation of $(X_1,X_2)$, say, $Y = g(X_1,X_2)$.
Discrete case
Let $p_{X_1, X_2} (x_1, x_2)$ be the joint pmf of two discrete-type random variables $X_1$ and $X_2$ with $\mathbf{S}$ the (2D) set of points at which $p_{X_1, X_2} (x_1, x_2) > 0$; i.e., $\mathbf{S}$ is the support of $(X_1, X_2)$. Let $y_1 = u_1 (x_1, x_2)$ and $y_2 = u_2 (x_1, x_2)$ define a one-to-one transformation that maps $\mathbf{S}$ onto $\mathbf{T}$.
The joint pmf of the two new random variables $Y_1 = u_1 (X_1, X_2)$, $Y_2 = u_2 (X_1, X_2)$ is given by
where $x_i = w_i (y_1, y_2)$ is the single-valued inverse of $y_i = u_i (x_1, x_2)$. From this joint pmf $p_{Y_1,Y_2} (y_1,y_2)$ we may obtain the marginal pmf of $Y_1$ by summing on $y_2$ or the marginal pmf of $Y_2$ by summing on $y_1$.
$\mathbf{Example\ 2.2.1.}$
Continuous case
Let $f_{X_1, X_2} (x_1, x_2)$ be the joint pdf of two continuous-type random variables $X_1$ and $X_2$ with the support set $\mathbf{S}$. Consider the transformed random vector $(Y_1, Y_2) = T(X_1, X_2)$ where $T$ is one-to-one continous transformation. Let $\mathbf{T} = T(\mathbf{S})$ denote the support of $(Y_1, Y_2)$.
Rewrite the transformation in terms of its components as $(Y_1,Y_2) = T(X_1, X_2) =(u_1 (X_1,X_2), u_2(X_1,X_2))$, where the functions $y_1 = u_1 (x_1, x_2)$ and $y_2 = u_2 (x_1, x_2)$ define $T$. Since the transformation is one-to-one, the inverse $T^{-1}$ exists. We write it as $x_1 = w_1 (y_1, y_2)$, $x_2 = w_2 (y_1, y_2)$. Finally, we need the Jacobian of the trasnformation which is the determinant of order 2 given by
Note that $J$ plays the role of $dx/dy$ in the univariate case. We assume that these first-order partial derivatives are continuous and that the Jacobian $J$ is not identically equal to zero in $\mathbf{T}$.
Let $B$ be any region in $\mathbf{T}$ and let $A = T^{-1} (B)$. Since $T$ is one-to-one, $P[(X_1, X_2) \in A] = P[T(X_1, X_2) \in T(A)] = P[(Y_1, Y_2) \in B]$. Then, we have
As $B$ is arbitrary, the last integrand must be the joint pdf of $(Y_1, Y_2)$. That is the pdf of $(Y_1, Y_2)$ is
$\mathbf{Example\ 2.2.3.}$
MGF techniques
In addition to the change-of-variable and cdf techniques for finding distribtuions of functions of random variables, there is another method, called the moment generating function (mgf) technique, which works well for linear functions of random variables.
In subsection 2.1.2, we pointed out that if $Y = g(X_1,X_2)$, then $\mathbb{E}(Y)$, if it exists, could be found by $E(Y) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x_1, x_2) f_{X_1, X_2} (x_1, x_2) dx_1 dx_2$ in the continuous case, with summations replacing integrals in the discrete case.
Certainly, that function $g$ could be $e^{tu(X_1, X_2)}$, so that in reality we would be finding the mgf of the function $Z = u(X_1, X_2)$. If we could then recognize this mgf as belonging to a certain distribution, then $Z$ would have that distribution!
$\mathbf{Example\ 2.2.6.}$
2.3 Conditional Distributions and Expectations
In this section, we discuss conditional distributions, i.e., the distribution of one of the random variables when the other has assumed a specific value.
$\mathbf{Definition.}$ (Conditional pmf).
Let $X_1$ and $X_2$ denote random variables of the discrete type, which have the joint pmf $p_{X_1, X_2} (x_1,x_2)$ that is positive on the support set $\mathbf{S}$ and is zero elsewhere. Let $p_{X_1} (x_1)$ and $p_{X_2} (x_2)$ denote, respectively, the marginal probability mass functions of $X_1$ and $X_2$. Let $x_1$ be a point in the support of $X_1$. Using the definition of conditional probability, we have
for all $x_2$ in the support of $X_2$. Define this function as
We call $p_{X_2 | X_1} (x_2 | x_1)$ the conditional pmf of the discrete random variable $X_2$, given that the discrete random variable $X_1 = x_1$.
$\mathbf{Note.}$ For any fixed $x_1$ in the support of $X_1$, this function satisfies the conditions of being a pmf of the discrete type because it is nonnegative and
$\mathbf{Definition}$ (Conditional pdf).
Now, let $X_1$ and $X_2$ denote random variables of the continuous type, which have the joint pdf $f_{X_1, X_2} (x_1,x_2)$ and the marginal probability density functions $f_{X_1} (x_1)$ and $f_{X_2} (x_2)$, respectively. Using the definition of conditional probability, we have
We call $f_{X_2 | X_1} (x_2 | x_1)$ the conditional pdf of the continuous random variable $X_2$, given that the conditional random variable $X_1 = x_1$.
$\mathbf{Note.}$ In this relation, $x_1$ is to be thought of as having a fixed (but any fixed) value for which $f_{X_1} (x_1) > 0$. It is evident that $f_{X_2 | X_1} (x_2 | x_1)$ is nonnegative and that
Also, the conditional probability that $a < X_2 < B$ given $X_1 = x_1$ is computed as
$\mathbf{Definition}$ (Conditional Expectation). If $u(X_2)$ is a function a random variable $X_2$ then the conditional expectation of $u(X_2)$ given $X_1 = x_1$ (if it exists) is
If it exists then $\mathbb{E}[X_2 | x_1]$ is the conditional mean of the conditional distribution of $X_2$ given $X_1 = x_1$. If it exists then $\mathbb{E}[(x_2 - \mathbb{E}[X_2 | x_1])^2 | x_1] = \text{Var}(X_2 | x_1) = \mathbb{E}(X_{2}^2 | x_1) - [\mathbb{E}(X_2 | x_1)]^2$ is the conditional variance of the conditional distribution of $X_2$ given $X_1 = x_1$.
$\mathbf{Example\ 2.3.1.}$
$\mathbf{Thm\ 2.3.1.}$ Let $(X_1, X_2)$ be a random vector such that the variance of $X_2$ is finite. Then,
(a) $\mathbb{E}[\mathbb{E}(X_2 | X_1)] = \mathbb{E}(X_2)$
(b) $\text{Var}[\mathbb{E}(X_2 | X_1)] \leq \text{Var}(X_2)$
$\mathbf{Proof.}$
First, we prove (a). Note that
For (b), consider with $\mu_2 = \mathbb{E}(X_2)$,
We claim that the last term is zero. It is because $\mathbb{E}[ (X_2 - \mathbb{E}(X_2 | X_1)) ] = \mathbb{E}[X_2] - \mathbb{E}[\mathbb{E}(X_2 | X_1)] = \mathbb{E}[X_2] - \mathbb{E}[X_2] = 0$.
So, we have $\text{Var}(X_2) = \mathbb{E}[(X_2 - \mathbb{E}(X_2 | X_1))^2] + \mathbb{E}[(\mathbb{E}(X_2 | X_1) - \mu_2)^2] \geq \mathbb{E}[(\mathbb{E}(X_2 | X_1) - \mu_2)^2] = \text{Var}(\mathbb{E}[X_2 | X_1])._\blacksquare$
2.4 Independent Random Variables
Let $X_1$ and $X_2$ denote the random variables of the continuous type that have the joint pdf $f(x_1, x_2)$ and marginal probability density functions $f_1(x_1)$ and $f_2(x_2)$, respectively. In accordance with the definition of the conditional pdf $f_{2|1}(x_2|x_1)$, we may write the joint pdf $f(x_1,x_2)$ as
Suppose that $f_{2|1} (x_2 | x_1)$ doesn’t depend on $x_1$. Then the marginal pdf of $X_2$ is
Accordingly, $f_2 (x_2) = f_{2|1} (x_2 | x_1)$ and $f(x_1, x_2) = f_1(x_1) f_2 (x_2)$.
$\mathbf{Def\ 2.4.1}$ (independence).
Let the random variables $X_1$ and $X_2$ have the joint pdf $f (x_1, x_2)$ [joint pmf $p(x_1,x_2)$] and the marginal pdfs [pmfs] $f_1(x_1)$ [ $p_1(x_1)$ ] and $f_2(x_2)$ [ $p_2(x_2)$ ], respectively.
The random variables $X_1$ and $X_2$ are said to be independent if and only if $f (x_1,x_2) = f_1(x_1)f_2(x_2)$ [ $p(x_1,x_2) = p_1(x_1)p_2(x_2)$ ].
Random variables that are not independent are said to be dependent.
$\mathbf{Remark\ 2.4.1.}$ The identity in the definition should be interpreted as follows. There may be certain points $(x_1,x_2) \in \mathbf{S}$ at which $f (x_1,x_2) = f_1(x_1)f_2(x_2)$. However, if $A$ is the set of points $(x_1,x_2)$ at which the equality does not hold, then $P(A)=0$. In subsequent theorems and the subsequent generalizations, a product of nonnegative functions and an identity should be interpreted in an analogous manner.
$\mathbf{Example\ 2.4.2.}$
But, it is possible to assert that the random variables $X_1$ and $X_2$ of $\mathbf{Example\ 2.4.2}$ are dependent, without computing the marginal probability density functions.
$\mathbf{Thm\ 2.4.1.}$ Let the random variables $X_1$ and $X_2$ have supports $\mathbf{S_1}$ and $\mathbf{S_2}$, respectively, and have the joint pdf $f(x_1, x_2)$. Then $X_1$ and $X_2$ are independent if and only if $f(x_1, x_2)$ can be written as a product of a nonnegative function of $x_1$ and a nonnegative function of $x_2$. That is,
where $g(x_1) > 0, x_1 \in \mathbf{S_1},$ zero elsewhere, and $h(x_2) > 0, x_2 \in \mathbf{S_2},$ zero elsewhere.
$\mathbf{Proof.}$
If $X_1$ and $X_2$ independent, then $f (x_1 , x_2) = f_1(x_1)f_2(x_2)$. Thus the condition is fulfilled.
Conversely, if $f (x_1 , x_2) = g(x_1)h(x_2)$, then we have
and
where $c_1$ and $c_2$ are constants. Moreover, $c_1 c_2 = 1$ since
Thus, $f(x_1, x_2) = g(x_1) h(x_2) = c_1 g(x_1) c_2 h(x_2) = f_1 (x_1) f_2 (x_2)._\blacksquare$
$\mathbf{Example\ 2.4.3.}$
Also, instead of working with pdfs (or pmfs), we could have presented independence in terms of cumulative distribution functions. The following theorem shows the equivalence.
$\mathbf{Thm\ 2.4.2.}$ Let $(X_1, X_2)$ have the joint cdf $F(x_1, x_2)$ and let $X_1$ and $X_2$ have the marginal cdfs $F_1 (x_1)$ and $F_2 (x_2)$, respectively. Then $X_1$ and $X_2$ are independent iff
for all $(x_1, x_2) \in \mathbb{R}^2.
$\mathbf{Proof.}$
Suppose expression holds. Then the mixed second partial is
Hence, they are independent.
Conversely, suppose they are independent. Then, by the definition of the joint cdf,
The next theorem frequently simplifies the calculations of probabilities of events that involves independent variables.
$\mathbf{Thm\ 2.4.3.}$ The random variables $X_1$ and $X_2$ are independent random pariables iff the following condition holds,
for every $a < b$ and $c < d$, where they are constants.
$\mathbf{Proof.}$
If $X_1$ and $X_2$ are independent, by Thm 2.4.2.,
Conversely, as the condition implies the joint cdf factors into a product of the marginal cdfs, done by Thm 2.4.2. $_\blacksquare$
$\mathbf{Example\ 2.4.4.}$
Not merely are calculations of some probabilities usually simpler when we have independent random variables, but many expectations, including certain moment generating functions, have comparably simpler computations. The following result proves so useful that we state it in the form of a theorem.
$\mathbf{Thm\ 2.4.4.}$ Suppose $X_1$ and $X_2$ are independent and that $E(u(X_1))$ and $E(v(X_2))$ exists. Then
$\mathbf{Thm\ 2.4.5.}$ Suppose the joint mgf, $M(t_1, t_2)$, exists for the random variables $X_1$ and $X_2$. Then $X_1$ and $X_2$ are independent if and only if
that is, the joint mgf is identically equal to the product of the marginal mgfs.
$\mathbf{Example\ 2.4.4.}$
If $X_1$ and $X_2$ are independent, then $M(t_1, t_2) = E(e^{t_1 X_1}e^{t_2 X_2}) = E(e^{t_1 X_1})E(e^{t_2 X_2}) = M(t_1, 0) M(0, t_2)$ by Thm 2.4.4.
Suppose the condition. Then, $M(t_1, t_2) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{t_1 X_1 + t_2 X_2} f_1 (x_1) f_2 (x_2) dx_1 dx_2$.
But, $M(t_1, t_2)$ is the mgf of $X_1$ and $X_2$, thus $M(t_1, t_2) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{t_1 X_1 + t_2 X_2} f (x_1, x_2) dx_1 dx_2$.
The uniqueness of the mgf implies that $f(x_1, x_2) = f_1 (x_1) f_2 (x_2)._\blacksquare$
$\mathbf{Example\ 2.4.5.}$
Reference
[1] Hogg, R., McKean, J. & Craig, A., Introduction to Mathematical Statistics, Pearson 2019
Leave a comment