17 minute read

(This post is a summary of Chapter 1 of [1])

1.1 Introduction

Many kinds of investigations may be characterized in part by the fact that repeated experimentation, under essentially the same conditions, is more or less standard procedure. (e.g. medical research, etc)

But it is characteristic of these experiments that the outcome cannot be predicted with certainty prior to the experiment. Suppose that we have such an experiment, but it is the experiment of such a nature that a collection of every possible outcome can be described prior to its performance.

If this kind of experiment can be repeated under the same conditions, it’s called a random experiment. And in that case, the collection of every possible outcome is called Sample space $\Omega$. Subsets of $\Omega$ are often called events.

The primary purpose of having a mathematical theory of statistics is to provide mathematical models for random experiments.



1.2 Sets

1.2.1 Review of Set Theory

Let’s skip the basic concepts (union, intersection, disjoint, …)

$\mathbf{Definition.}$ monotone
We occassionally have sequences of sets that are monotone:

nondecreasing (nested upward)
We say a sequence of sets ${A_n}$ is nondecreasing if $A_n \subset A_{n+1}$ for $n = 1, 2, 3, …$.
For such a sequence, we define

$\lim_{n \rightarrow \infty} A_n = \bigcup_{n=1}^{\infty} A_n$

nonincreasing (nested downward)
We say a sequence of sets ${A_n}$ is nonincreasing if $A_n \supset A_{n+1}$ for $n = 1, 2, 3, …$.
For such a sequence, we define

$\lim_{n \rightarrow \infty} A_n = \bigcap_{n=1}^{\infty} A_n$


1.2.2 Set Functions

Many of the functions used in calculus are functions that map real numberss to set. We are concerened also with functions that map sets into real numbers. Such functions are naturally called functions of a set or, more simply, set functions.





1.3 The Probability Set Function

We are interested in assigning probabilities to events, i.e., subsets of $\Omega$. Then, what should be our collections of events? If the sample space $\Omega$ is a finite set, then we could take the set of all subsets as this collection. But, it is not a simple question for infinite sample spaces, though.

So, we assume that in all cases, the collection of events is sufficiently rich to include all possible events of interest and is closed under complements and countable unions of these events. We denote this collection of events by $\mathcal{F}$. Technically, such a collection of events is called a $\sigma$-field of subsets.

$\mathbf{Definition}$ (field). A field is a non-empty collection of subsets of sample space $\Omega$ closed under finite union, finite intersection and complements. A synonym for field is algebra. A minimal set of postulates for $\mathcal{F}$ to be a field is:

(i) $\Omega \in \mathcal{F}$
(ii) $A \in \mathcal{F} \Rightarrow A^C \in \mathcal{F}$
(iii) $A, B \in \mathcal{F} \Rightarrow A \cup B \in \mathcal{F}$



$\mathbf{Definition}$ ($\sigma$-field). A $\sigma$-field is a non-empty collection of subsets of sample space $\Omega$ closed under countable union, countable intersection and complements. A synonym for $\sigma$-field is $\sigma$-algebra. A minimal set of postulates for $\mathcal{F}$ to be a $\sigma$-field is:

(i) $\Omega \in \mathcal{F}$
(ii) $A \in \mathcal{F} \Rightarrow A^C \in \mathcal{F}$
(iii) $A_1, A_2, \cdots \in \mathcal{F} \Rightarrow \bigcup_{n=1}^\infty A_n \in \mathcal{F}$

$\mathbf{Example.}$ Some examples of $\sigma$-field
  • Let $\mathcal{P}(\Omega)$, the power set of $\Omega$, is a $\sigma$-field.
  • The trivial $\sigma$-field. \(\{\varnothing, \Omega\}\) is a $\sigma$-field.
  • The countable/co-countable $\sigma$-field. Let $\Omega=\mathbb{R}$, and
\[\begin{align*} \mathcal{B}= \{ A \subset \mathbb{R}: A \text{ is countable} \} \cup \{A \subset \mathbb{R}: A^c \text{ is countable} \} \end{align*}\]

is a $\sigma$-field.





Now that we have a sample space, $\Omega$, and our collection of events, $\sigma$-field $\mathcal{F}$, we can define

$\mathbf{Def\ 1.3.1}$ (Probability)
Let $\Omega$ be a sample space and let $\mathcal{F}$ be the set of events, $sigma$-field. Let $P$ be a real-valued function defined on $\mathcal{F}$. Then $P$ is a probability set function, or probability measure if $P$ satisfies the following 3 conditions:

(1) $P(A) \geq 0$, for all $A \in \mathcal{F}$
(2) $P(\Omega) = 1$
(3) $P$ is $\sigma$-additive: If ${A_n}$ is a sequence of events in $\mathcal{F}$ and $A_m \cap A_n = \varnothing$ for all $m \neq n$, i.e, they are disjoint, then

$P(\bigcup_{n=1}^{\infty} A_n) = \sum_{n=1}^{\infty} P(A_n)$


We call such a triple $(\Omega, \mathcal{F}, P)$ probability space.

$\mathbf{Remark\ 1.3.1.}$ The definition of probability consists of 3 axiom, which we motivate by the following three intuitive properties of relative frequency:

  • mutually exclusive
    A collection of events whose elements are pairwise disjoint, as in (3), is said to mutually exclusive collection. (And also, its union is often referred to as a disjoint union)

  • Exhausitive
    The coolection is further said to be exhausitive if the union of its events is the sample space, i.e, $\sum_{n=1}^{\infty} P(A_n) = 1$.

  • Partition
    If the collection is mutually exclusive and exhausitive, we often say that the collection of events forms a partition of the sample space.


$\mathbf{Thm\ 1.3.1.}$ For each event $A \in \mathcal{B}, P(A) = 1 - P(A^c)$.

$\mathbf{Proof.}$

$\mathcal{C} = A \cup A^c, \varnothing = A \cap A^c$. From (2) and (3), $1 = P(A) + P(A^c)._\blacksquare$


$\mathbf{Thm\ 1.3.2.}$ $P(\varnothing) = 0$.

$\mathbf{Proof.}$

From Thm 1.3.1 and (2), done $._\blacksquare$


$\mathbf{Thm\ 1.3.3.}$ If $A$ and $B$ are events s.t. $A \subset B$, then $P(A) \leq P(B)$.

$\mathbf{Proof.}$

From $B = A \cup (B \cap A^c)$ and (1) & (3), done $._\blacksquare$


$\mathbf{Thm\ 1.3.4.}$ For each event $A \in \mathcal{B}, 0 \leq P(A) \leq 1$.

$\mathbf{Proof.}$

From Thm 1.3.3 (set $B$ as the sample space $\mathcal{C}$ ) and (1), done $._\blacksquare$


$\mathbf{Thm\ 1.3.5.}$ If $A$ and $B$ are events, then $P(A \cup B) = P(A) + P(B) - P(A \cap B)$.

$\mathbf{Proof.}$

Note that $A \cup B = A \cup (A^c \cap B)$. So, $P(A \cup B) = P(A) + P(A^c \cap B)$ by (3).
Note that $B = (A \cap B) \cup (A^c \cap B)$. So, $P(B) = P(A \cap B) + P(A^c \cap B)._\blacksquare$


$\mathbf{Thm\ 1.3.6.}$ Subadditivity
For events $A_n$, $n \geq 1$, it holds that

\[\begin{align*} P(\bigcup_{n=1}^\infty A_n ) \leq \sum_{n=1}^\infty P(A_n). \end{align*}\]
$\mathbf{Proof.}$

To verify this, first we will see

\[\begin{align*} \bigcup_{n=1}^\infty A_n = A_1 \cup (A_2 - A_1) \cup (A_3 - (A_2 \cup A_1)) \cup \cdots \end{align*}\]

The proof of it is clear: if $x$ is in the set on the right, then $x \in A_n - (A_1 \cup A_2 \cup \cdots A_{n-1}$ for some $n \in \mathbb{N}$. Thus it also belongs to the set on the left. Conversely, if $x$ is in the set on the left, then $x \in A_n$ for some $n \in \mathbb{N}$. Let $n_0$ be the smallest such $n$. Then, $x \in A_{n_0} - (A_1 \cup A_2 \cup \cdots A_{n_0 - 1}$.


And, derive (since $P$ is $\sigma$-additive)

\[\begin{align*} P(\bigcup_{n=1}^\infty A_n) &= P(A_1) + P(A_2 - A_1) + P(A_3 - (A_2 \cup A_1)) + \cdots \\ &\leq P(A_1) + P(A_2) + P(A_3) + \cdots._\blacksquare \end{align*}\]


$\mathbf{Thm\ 1.3.7.}$ The inclusion-exclusion formula.

\[\begin{aligned} P\left(\bigcup_{i=1}^n A_i\right)= & \sum_{i=1} P\left(A_i\right)-\sum_{i<j} P\left(A_i \cap A_j\right)+\sum_{i<j<k} P\left(A_i \cap A_j \cap A_k\right)-\cdots \\ & +(-1)^{n-1} P\left(A_1 \cap \cdots \cap A_n\right). \end{aligned}\]
$\mathbf{Proof.}$

You can proof it by mathematical induction and the fact $P(A) + P(B) - P(A \cap B)$:

\[\begin{aligned} P\left(\bigcup_{i=1}^{n+1} A_i\right) & =P\left(\bigcup_{i=1}^n A_i\right)+P\left(A_{n+1} \backslash \bigcup_{i=1}^n A_i\right) \\ & =P\left(\bigcup_{i=1}^n A_i\right)+P\left(A_{n+1}\right)-P\left(\bigcup_{i=1}^n\left(A_i \cap A_{n+1}\right)\right) . \end{aligned}\]



$\mathbf{Def\ 1.3.2}$ (Equilikely Case)
Let

\[\begin{align*} C = \{ x_1, x_2, ..., x_n \} \end{align*}\]

be a finite sample space. Let $p_i = 1/m$ for all $i = 1, 2, …, m$ and for all subsets $A$ of $\mathcal{C}$ define

$P(A) = \sum_{x_i \in A} 1/m = \frac { \#(A)}{m}$


where the numerator denotes the number of elements in $A$. Then $P$ is a probability on $\mathcal{C}$ and it is refered to as the equilikely case.


Because the equilikely case is often of interest, we next develop some counting rules which can be used to compute the probabilities.

1.3.1 Counting Rules

a) multiplication rule

It is quite simple, consider the case:

There are five roads (ways) between cities I and II and there are ten roads (ways) between cities II and III. Hence, there are $5 \times 10 = 50$ ways to get from city I to city III by going from city I to city II and then from city II to city II.

b) permutation

Let $A$ be a set with $n$ elements.

Suppose we are interested in $k$-tuples whose components are elements of $A$. Then, there are $n \cdot n \cdots n = n^k$ such $k$-tuples whose components are elements of $A$.

Next, suppose $k \leq n$ and we are interested in k-tuples whose components are distinct (no repeats) elements of $A$. Then, there are $n \cdot (n-1) \cdots (n - (k-1))$ such $k$-tuples. We call each such $k$-tuple a permutation and use the symbol $P_{k}^{n}$ to denote the number of $k$ permutations taken from a set of $n$ elements.

\[\begin{align*} P_{k}^{n} = n \cdot (n-1) \cdots (n - (k-1)) = \frac {n!}{(n-k)!} \end{align*}\]

c) combination

Now, suppose order is not important, so instead of counting the number of permutations we want to count the number of subsets of $k$ elements taken from $A$.

We use the symbol $\binom{n}{k}$ to denoter the number of subsets of $k$ elements. By the permutation rule it generates $P_{k}^{k} = k \cdot (k − 1) \cdots 1 = k!$ permutations.

And, all these permutations are distinct from the permutations generated by other subsets.

Finally, each permutation of $k$ distinct elements drawn from $A$ must be generated by one of these subsets.

$\therefore P_{k}^{k} = \binom{n}{k}k!$; $\binom{n}{k} = \frac{n!}{k!(n-k)!}$.

1.3.2 Additional Properties of Probability

These several additional properties of probability prove useful in the sequel.

First, consider $\displaystyle \lim_{n \to \infty} P(C_n)$ for nondecreasing sequence ${ C_n }$. The question is: can we legitimately interchange the limit and $P$?

$\mathbf{Thm\ 1.3.8.}$ Let ${ C_n }$ be a nondecreasing sequence of events. Then

\[\begin{align*} \displaystyle \lim_{n \to \infty} P(C_n) = P(\displaystyle \lim_{n \to \infty} C_n) = P(\bigcup_{n=1}^{\infty} C_n). \end{align*}\]

Let ${ C_n }$ be a decreasing sequence of events. Then

\[\begin{align*} \displaystyle \lim_{n \to \infty} P(C_n) = P(\displaystyle \lim_{n \to \infty} C_n) = P(\bigcap_{n=1}^{\infty} C_n). \end{align*}\]
$\mathbf{Proof.}$

Let’s prove the first result.
Define the sets \(\begin{align*} \{R_n\} \end{align*}\)

as $R_1 = C_1$ and for $n > 1$, $R_n = C_n \cap C_{n-1}^{c}$.

It follows that $\bigcup_{n=1}^{\infty} C_n = \bigcup_{n=1}^{\infty} R_n$ and that $R_m \cap R_n$ for $m \neq n$.
Also, $P(R_n) = P(C_n) - P(C_{n-1})$. By the 3rd axiom of probability,

\[\begin{align*} P[\displaystyle \lim_{n \to \infty} C_n] &= P(\bigcup_{n=1}^{\infty} R_n) \\ &= \sum_{n=1}^{\infty} P(R_n) \\ &=\displaystyle \lim_{n \to \infty} \sum_{j=1}^{n} P(R_j) \\ &= \displaystyle \lim_{n \to \infty} \{ P(C_1) + \sum_{j=2}^{n} [P(C_j) - P(C_{j-1})] \\ &= \displaystyle \lim_{n \to \infty} P(C_n). \end{align*}\]

Now, for second one, it is obvious that ${ C_{n}^{c} }$ is a nondecreasing set of events and

\[\begin{align*} P(\bigcup_{n=1}^{\infty} C_{n}^{c}) = 1 - P(\bigcap_{n=1}^{\infty} C_{n}^{c}). \end{align*}\]

From $\displaystyle \lim_{n \to \infty} P(C_{n}^{c}) = \displaystyle \lim_{n \to \infty} (1 - P(C_n)) = 1 - \displaystyle \lim_{n \to \infty} P(C_n)$, done. $_\blacksquare$




1.4 Conditional Probability and Independence

In some random experiments, we are interested only in those outcomes that are elements of a subset $A$ of the sample space $\mathcal{C}$. This means, for our purposes, that the sample space is effectively the subset $A$. We are now confronted with the problem of defining a probability set function with $A$ as the “new” sample space.

$\mathbf{Def\ 1.4.1}$ (Conditional Probability)
Let $A$ and $B$ be events with $P(A), P(B) > 0$. Then we define the conditional probability of $B$ given $A$ as

\[\begin{align*} P(B|A) = \frac {P(A \cap B)}{P(A)}._\blacksquare \end{align*}\]

$\mathbf{Remark.}$
1) From the definition, we observe that $P(A \cap B) = P(A) \cdot P(B | A)$. This relation is frequently called the multiplication rule for probabilities.

2) It can be extended to 3 or more events. In the case of 3 events,
\(\begin{align*} P(A \cap B \cap C) = P[(A \cap B) \cap C] = P(A \cap B)P(C|A \cap B). \end{align*}\)

The general formula for $k$ events can be proved by mathematical induction.

3) Moreover, we have

  1. $P(B | A) \geq 0$.
  2. $P(A | A) = 1$.
  3. $P(\bigcup_{n=1}^{\infty} B_n | A) = \sum_{n=1}^{\infty} P(B_n | A)$, provided that $B_1, B_2, …$ are mutually exclusive events.

Properties (1) and (2) are evident.
For (3), \(\begin{align*} P(\bigcup_{n=1}^{\infty} B_n | A) = \frac {P[\bigcup_{n=1}^{\infty} (B_n \cap A)]} {P(A)} = \sum_{n=1}^{\infty} \frac {P[B_n \cap A]} {P(A)} = \sum_{n=1}^{\infty} P(B_n | A). \end{align*}\)

Note that these properties are precisely the conditions that a probability set function must satisfy. Accordingly, $P(B | A)$ is a probability set function, defined for subsets of $A$.

$\mathbf{Thm\ 1.4.1}$ (Bayes’ Theorem)
Let $A_1, A_2, …, A_k$ be events s.t. $P(A_i) > 0$, $i = 1, 2, 3, …, k$. Assume further that $A_1, A_2, …, A_k$ are mutually exclusive and exhausitive. Let $B$ be any events. Then

\[\begin{align*} P(A_j | B) = \frac {P(A_j) P(B|A_j)} {\sum_{i=1}^{k} P(A_i) P(B|A_i)}._\blacksquare \end{align*}\]
$\mathbf{Proof.}$
By definition, we have $P(A_j | B)= \frac {P(A_j) P(B A_j)} {P(B)}$. And, since $A_i$ are exhausitive,
\[\begin{align*} B = B \cap (A_1 \cup A_2 \cup \cdots A_k) = (B \cap A_1) \cup (B \cap A_2) \cup \cdots \cup (B \ cap A_k). \end{align*}\]

Since $B \cap A_i$, $i=1, 2, …, k$, are mutually exclusive, we have

\[\begin{align*} P(B) = \sum_{i=1}^{k} P(A_i) P(B|A_i)._\blacksquare \end{align*}\]

Note that this result is the law of total probability.


$\mathbf{Example.}$

베이즈 정리 사건 $Y=y$가 발생함으로써 (즉 $P(Y=y) = 1$), 사건 $X=x$의 확률이 어떻게 변화하는지를 표현한다. 즉 사건 $B$가 일어났을 때, 사건 $A$가 발생 후 사건 $B$가 발생하는 인과관계를 통해 정반대의 인과관계(사건 $B$ 발생 후 사건 $A$ 발생)를 추론하는 것이다.


예를 들어 암을 진단하는 medical test를 시행한다고 하자. 만약 시험에서 양성이 나왔다면, 환자가 암에 걸렸을 확률은 얼마나 될까? 시험이 80%의 민감도 (sensitivity)를 가진다고 하자. (만약 환자가 암에 걸렸을 때, 시험에서 양성이 나올 확률이 0.8) 즉, $x = 1$이 양성 검출, $y = 1$이 환자가 암에 걸렸을 사건을 의미할때 $p(x=1 | y=1) = 0.8$이 된다.

그렇다면 이 test 만으로 암을 판단할 수 있을까? 양성이 검출되었을때 실제로 환자가 암에 걸렸을 확률 $p(y=1 | x=1)$은 몇 일까? 80%라고 생각할 수 있으나, 이는 잘못된 추측이다.

1) Base rate fallacy

$p(y=1)=0.004$를 고려해주어야 한다. 결국 환자가 암에 걸린 사건에 대한 확률이므로, 구하고자 하는 확률은 $0.004$보다는 작을 것이다.

2) false positive (false alarm)

$p(x=1 | y=0)$ 역시 고려해주어야 한다. $p(x=1 | y=0) = 0.1$

$p(y=1 | x=1) = \frac{p(x=1 | y=1)p(y=1)}{p(x=1 | y=1)p(y=1)+p(x=1 | y=0)p(y=0)} = 0.031$

즉 시험에서 양성이 도출되어도 실제로 암환자일 확률은 3% 밖에 되지 않는다. 결과가 크게 와닿지 않는데, 이를 확률이 아닌 빈도수 (인원수)로 이해하면 더 간단하다.

1000명 중 진짜 암환자는 4명이다. 4명 중 80%가 양성 판정을 받으므로, 진짜 암환자 중 3명 정도가 양성 판정, 나머지 1명은 음성 판정을 받는다. 암에 걸리지 않은 사람은 990명이다. 그럼에도 양성이 나올 확률은 10%로 99명이다. 따라서 양성 판정은 총 102명이 받는다.

이러한 결과가 나오는 것은, 암 환자보다 정상인의 비율이 압도적으로 많기 때문이다. 따라서, 상대적으로 부정확한 test의 경우 암 환자보다 정상인이 양성 환자 수에 큰 비율을 차지하게 된다.




1.4.1 Independence

But sometimes it happens that the occurrence of event $A$ does not change the probability of event $B$; that is, when $P(A) > 0$,

\[\begin{align*} $P(B|A) = P(B). \end{align*}\]

$\mathbf{Def\ 1.4.2}$ (Independent)
Let $A$ and $B$ be two events. We say that $A$ and $B$ are independent if $P(A \cap B) = P(A)P(B)._\blacksquare$

Events that are independent are sometimes called statistically independent, stochastically independent,or independent in a probability sense. In most instances, we use independent without a modifier if there is no possibility of misunderstanding.

$\mathbf{Remark.}$
If $A$ and $B$ are independent events, then the following 3 pairs of events ($A^c$ and $B$, $A$ and $B^c$, $A^c$ and $B^c$) are also independent.

$\mathbf{Proof.}$

WLOG, it suffices to show $A^c$ & $B$ independent.
From $B = (A \cap B) \cup (A^c \cap B)$, $P(A^c \cap B) = P(B) - P(A \cap B) = P(B)[1-P(A)] = P(B)P(A^c)._\blacksquare$



Suppose now that we have $n$ events. We say that they are (mutually) independent iff for every collection $k$ of these events, $2 \leq k \leq n$, and for every permutation $d_1, d_2, …, d_k$ of $1, 2, …, k$,

\[\begin{align*} P(A_{d_1} \cap A_{d_2} \cap \cdots \cap A_{d_k}) = P(A_{d_1})P(A_{d_2}) \cdots P(A_{d_k}). \end{align*}\]

In particular, if $A_1, …, A_n$ are mutually independent, then

\[\begin{align*} P(A_1 \cap A_2 \cap \cdots \cap A_n) = P(A_1)P(A_2) \cdots P(A_n). \end{align*}\]

Also, as with 2 sets, many combinations of these events and their complements are independent such as $A_{1}^c$ and $A_2 \cup A_{3}^c \cup A_4$.




1.5 Random Variables

The reader perceives that a sample space $\mathcal{C}$ may be tedious to describe if the elements of $\mathcal{C}$ are not numbers. We now discuss how we may formulate a rule, or a set of rules, by which the elements $c$ of $\mathcal{C}$ may be represented by numbers.

$\mathbf{Def\ 1.5.1}$ (Random Variable)
Consider a random experiment with a sample space $\mathcal{C}$. A function $X$, which assigns to each element $c \in \mathcal{C}$ one and only one number $X(c) = x$, is called a random variable. The space or range of $X$ is the set of real numbers $D$

\[\begin{align*} D = \{x: x = X(c), c \in \mathcal{C} \}._\blacksquare \end{align*}\]

$D$ generally is a countable set (then we call $X$ discrete random variables) or an interval of real numbers (then we call $X$ continuous random variable).

Given a random variable $X$, its range $D$ becomes the sample space of interest. Besides inducing the sample space, it also induces a probability which we call the distribution of $X$.

$\mathbf{Def\ 1.5.2}$ (Cumulative Distribution Function, cdf)
Let $X$ be a random variable. Then, its cumulative distribution function (cdf) is defined by $F_X (x)$, where

\[\begin{align*} F_X (x) = P_X ((-\infty, x]) = P(\{c \in \textbf{C} : X(c) \leq x \})._\blacksquare \end{align*}\]

Also, $F_X (x)$ is often called simply the distribution function (df).

$\mathbf{Ex\ 1.5.3.}$

Suppose we roll a fair die with the numbers 1 through 6 on it. Let $X$ be the upface of the roll. Then the space of $X$ is ${ 1, 2,…,6 }$ and its pmf is $p_{X} (i) = 1/6$, for $i = 1, 2, …, 6$.
If $x < 1$, then $F_X (x) = 0$. If $1 \leq x < 2, F_X (x) = 1/6$. Continuing this way, we see that the cdf of $X$ is an increasing step function which steps up by $p_X (i)$ at each $i$ in the space of $X$. The graph of $F_X$ is given by



equal in distribution
Let $X$ and $Y$ be 2 random variables. We say that $X$ and $Y$ are equal in distribution and write $X \overset{D}{=} Y$ iff $F_X (x) = F_Y (x)$ for all $x \in R$.

It is important to note while $X$ and $Y$ may be equal in distribution, they may be quite different. ( $X \overset{D}{=} Y$ but $X \neq Y$ ) For example, let

\[F_X (x) = \begin{cases} 0 & \text{ if } x < 0 \\ x & \text{ if } 0 \leq x < 1 \\ 1 & \text{ if } x \geq 1 \end{cases}\]

and define $Y = 1 - X$.


$\mathbf{Thm\ 1.5.1.}$ Let $X$ be a random variable with cdf $F(x)$. Then
(a) For all $a$ and $b$, if $a < b$, then $F(a) \leq F(b)$ ( $F$ is nondecreasing )
(b) $\displaystyle \lim_{x \to -\infty} F(x) = 0$ (lower limit of $F$ is 0)
(c) $\displaystyle \lim_{x \to \infty} F(x) = 1$ (upper limit of $F$ is 1)
(d) $\displaystyle \lim_{x \to x_0 +} F(x) = F(x_0)$ ($F$ is right continuous)

$\mathbf{Proof.}$

(a): As $a < b$, ${X \leq a} \in {X \leq b}$. Then, done by Thm 1.3.3.
(c): https://proofwiki.org/wiki/Limit_of_Cumulative_Distribution_Function_at_Positive_Infinity
(d): Let ${ x_n }$ be any sequence of real numbers such that $x_n \to x_{0}+$. Let $C_n = { X \leq x_n }$. Then, ${C_n}$ is decreasing and $\bigcap_{n=1}^{\infty} C_n = { X_n \leq x_0 }$. By Thm 1.3.6, done $._\blacksquare$



$\mathbf{Thm\ 1.5.2.}$ Let $X$ be a random variable with cdf $F(x)$. Then for $a < b$,

\[\begin{align*} P[A < X \leq b] = F_X (b) - F_X (a) \end{align*}\]
$\mathbf{Proof.}$

Note that

\[\begin{align*} \{-\infty < X \leq b \} = \{-\infty < X \leq a \} \cup \{ a < X \leq b \}. \end{align*}\]

The proof of the result follows immediately $._\blacksquare$



$\mathbf{Thm\ 1.5.3.}$ For any random variable $X$,

\[\begin{align*} P[X = x] = F_X (x) - F_X(x-) \end{align*}\]

for all $x \in \mathbb{R}$, where $F_X (x-) = \displaystyle \lim_{z \to x-} F_X (z)$

$\mathbf{Proof.}$

For any $x \in \mathbb{R}$, we have

\[\begin{align*} \{ x \} = \bigcap_{n=1}^{\infty} (x - \frac {1}{n}, x] \end{align*}\]

That is, it is the limit of a decreasing sequence of sets. By Thm 1.3.6,

\[\begin{align*} P[X=x] &= \displaystyle \lim_{n \to \infty} P [x - \frac{1}{n} < X \leq x] = \displaystyle \lim_{n \to \infty} [F_X (x) - F_X (x - (1/n))] \\ &=F_X(x) - F_X (x-)._\blacksquare \end{align*}\]




Reference

[1] Hogg, R., McKean, J. & Craig, A., Introduction to Mathematical Statistics, Pearson 2019
[2] Ash, Robert B. Basic probability theory. Courier Corporation, 2008

Leave a comment