4 minute read

QR Factorization

For many application, we are interested in the spaces generated by columns of a matrix:

\[\begin{align*} \left\langle\mathbf{a}_1 \right\rangle \subseteq \left\langle \mathbf{a}_1, \mathbf{a}_2 \right\rangle \subseteq \left\langle \mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3 \right\rangle \subseteq \cdots \end{align*}\]

And, QR factorization of matrix constructs of a sequence of orthonormal vectors \(\mathbf{q}_1, \mathbf{q}_2, \cdots\) that span these successive space.

Orthogonality

$\mathbf{Def.}$ Orthogonality, Orthonormality
For \(\mathbf{v}_1, \ldots, \mathbf{v}_k\), they are (mutually) orthogonal if

\[\begin{align*} \left\langle\mathbf{v}_i, \mathbf{v}_j\right\rangle = 0 \; (i \neq j) \end{align*}\]

And, if unit vectors are orthogonal, it is called orthonormal.

$\mathbf{Thm\ 1.}$ Any orthogonal set of nonzero vectors is linearly indepedent.

$\mathbf{Proof}$

Suppose

\[\begin{align*} c_1 \mathbf{v}_1 + \cdots + c_k \mathbf{v}_k = 0. \end{align*}\]

Then,

\[\begin{align*} &c_1 \left\langle \mathbf{v}_1, \mathbf{v}_i \right\rangle + \cdots + c_k \left\langle\mathbf{v}_k, \mathbf{v}_i\right\rangle = 0 \\ &c_i \left\langle\mathbf{v}_i, \mathbf{v}_i \right\rangle = 0 \Rightarrow c_i = 0._\blacksquare \end{align*}\]



$\mathbf{Thm\ 2.}$ Let \(\{ \mathbf{v}_1, \ldots, \mathbf{v}_k \}\) be an orthonormal bases for a vector space $V$. For any vector $\mathbf{v} \in V$,

\[\begin{align*} \mathbf{v} &= \left\langle\mathbf{v}, \mathbf{v}_1\right\rangle \mathbf{v}_1 + \cdots + \left\langle\mathbf{v}, \mathbf{v}_n\right\rangle \mathbf{v}_n \\ &= \sum_{i=1}^n \left\langle\mathbf{v}, \mathbf{v}_i\right\rangle \mathbf{v}_i \end{align*}\]

and this representation is unique.

$\mathbf{Proof}$

Since the basis span the space, we have \(\mathbf{v} = x_1 \mathbf{v}_1 + \cdots + x_n \mathbf{v}_n\) for some real $x_i^\prime$s.

From the orthogonality and normality,

\[\begin{align*} \left\langle\mathbf{v}, \mathbf{v}_j\right\rangle=\left\langle\sum_{i=1}^n x_i \mathbf{v}_i, \mathbf{v}_j\right\rangle=\sum_{i=1}^n x_i\left\langle\mathbf{v}_i, \mathbf{v}_j\right\rangle=x_j\left\langle\mathbf{v}_j, \mathbf{v}_j\right\rangle=x_j, \text { for each } j=1, \ldots, n \end{align*}\]



From $\mathbf{Thm\ 2.}$, if there exists orthonormal bases for a vector space $V$, the inner products of a vector and orthonormal bases play a role of coordinate.

Gram-Schmidt process

Then, can we obtain an orthonormal vector set for any vector space? Through Gram-Schmidt procedure, arbitrary linearly independent can be converted into an orthonormal set:

$\mathbf{Def.}$ Gram-Schmidt process
1. Let \(\mathbf{v}_1 = \frac{\mathbf{w}_1}{\| \mathbf{w}_1 \|}\)
2. Obtain \(\mathbf{v}_2\) that is orthogonal to \(\mathbf{w}_1\) spanned by \(\mathbf{v}_1\):

\[\begin{align*} \mathbf{v}_2 = \mathbf{w}_2 - \text{proj}_{\mathbf{w}_1} = \mathbf{w}_2 - \left\langle\mathbf{v}_1, \mathbf{w}_2 \right\rangle \mathbf{v}_1 \end{align*}\]

and normalize it. So, \(\mathbf{v}_2 \neq 0\). If it is $0$, \(\mathbf{w}_2 = k \mathbf{w}_1\) (contradiction)

image


3. Obtain \(\mathbf{v}_3\) that is orthogonal to \(W_2\) spanned by \(\mathbf{v}_1\) and \(\mathbf{v}_2\):

\[\begin{align*} \mathbf{v}_3 = \mathbf{w}_3 - \text{proj}_{W_2} = \mathbf{w}_3 - \left\langle\mathbf{v}_1, \mathbf{w}_3 \right\rangle \mathbf{v}_1 - \left\langle\mathbf{v}_2, \mathbf{w}_3 \right\rangle \mathbf{v}_2 \end{align*}\]

and normalize it. Also note that \(\mathbf{v}_3 \neq 0\).

image


4. Continuing in this way produces an orthonormal set \(\{ \mathbf{v}_1, \mathbf{v}_2, \cdots, \mathbf{v}_k \}\) after $k$ steps.

QR Factorization

To get to the point, we want

\[\begin{align*} \left\langle \mathbf{q}_1, \cdots, \mathbf{q}_j \right\rangle = \left\langle \mathbf{a}_1, \cdots, \mathbf{a}_j \right\rangle \; (j = 1, 2, \cdots, n) \end{align*}\]

Then, with some constants $r_{ij}$,

\[\begin{align*} &\mathbf{a}_1 = r_{11} \mathbf{q}_1 \\ &\mathbf{a}_2 = r_{12} \mathbf{q}_1 + r_{22} \mathbf{q}_2 \\ &\mathbf{a}_3 = r_{13} \mathbf{q}_1 + r_{23} \mathbf{q}_2 + r_{33} \mathbf{q}_3 \\ &\vdots \\ &\mathbf{a}_n = r_{1n} \mathbf{q}_n + r_{2n} \mathbf{q}_2 + \cdots + r_{nn} \mathbf{q}_n \\ \end{align*}\]

Thus, for any $A \in \mathbb{C}^{m \times n}$, we can factorize it by $A = \hat{Q} \hat{R}$

\[\left[\mathbf{a}_1 \; | \; \cdots \; | \; \mathbf{a}_n\right]=\left[\mathbf{q}_1 \; | \; \cdots \; | \; \mathbf{q}_n\right]\left[\begin{array}{cccc} r_{11} & r_{12} & \cdots & r_{1 n} \\ & r_{22} & \ldots & r_{2 n} \\ & & \ddots & \vdots \\ & & & r_{n n} \end{array}\right]\]

where $\hat{Q}$ is $m \times n$ matrix with orthonormal columns, which is called reduced QR factorization. By appending $m - n$ additional orthonormal columns to $\hat{Q}$ and padding rows of zeros to $\hat{R}$, it becomes $A = QR$ where $m \times m$ unitary $Q$, and it is called full QR factorization.

Note that Gram-Schmidt orthogonalization suggests a method for compute the factorization. It allows us to construct orthonormal \(\mathbf{q}_1, \mathbf{q}_2, \cdots\) such that \(\mathbf{q}_j \in \left\langle \mathbf{a}_1, \cdots, \mathbf{a}_j \right\rangle\):

\[\begin{align*} \mathbf{q}_1 &= \frac{\mathbf{a}_1}{r_{11}} \\ \mathbf{q}_2 &= \frac{\mathbf{a}_2 - \left\langle \mathbf{q}_1, \mathbf{a}_2 \right\rangle \mathbf{q}_1}{r_{22}} = \frac{\mathbf{a}_2 - r_{12} \mathbf{q}_1}{r_{22}}\\ & \quad \vdots \\ \mathbf{q}_n &= \frac{\mathbf{a}_n - \sum_{i=1}^{n-1} \left\langle \mathbf{q}_j, \mathbf{a}_n \right\rangle \mathbf{q}_i}{r_{nn}} = \frac{\mathbf{a}_n - \sum_{i=1}^{n-1} r_{in} \mathbf{q}_i}{r_{nn}} \\ \end{align*}\]

And it is evident

\[\begin{align*} r_{ij} = \begin{cases} \left\langle \mathbf{q}_i, \mathbf{a}_j \right\rangle & \text{ if } i < j \\ \| \mathbf{a}_j - \sum_{i=1}^{j-1} r_{ij} \mathbf{q}_i \| & \text{ if } i = j \end{cases} \end{align*}\]



Thus, Gram-Schmidt process ensures the existence (and uniqueness under suitable restriction) of QR factorization of any matrices:

$\mathbf{Thm\ 1.1.}$ Existence
Every $A \in \mathbb{C}^{m \times n} \; (m \geq n)$ has a full QR factorization (hence reduced QR factorization.)

$\mathbf{Proof.}$

Note that Gram-Schmidt may fails when it encounters \(\mathbf{v}_j = \mathbf{0}\), which means

\[\begin{align*} \mathbf{a}_j \in \left\langle \mathbf{q}_1, \cdots, \mathbf{q}_{j-1} \right\rangle = \left\langle \mathbf{a}_1, \cdots, \mathbf{a}_{j-1} \right\rangle \end{align*}\]

Thus, the existence is obvious for full rank $A$. If not, at this moment just pick arbitary unit vector \(\mathbf{q}_j\) that orthogonal to \(\left\langle \mathbf{q}_1, \cdots, \mathbf{q}_{j-1} \right\rangle\).



$\mathbf{Thm\ 1.2.}$ Uniqueness Each $A \in \mathbb{C}^{m \times n} \; (m \geq n)$ of full rank has a unique reduced QR factorization $A=\hat{Q} \hat{R}$ with $r_{ii}>0$.

$\mathbf{Proof.}$

By assumption of full rank, $r_{ii}$ can’t be $0$. So, at each successive step $j$, Gram-Schmidt process determines $r_{i }$ and \(\mathbf{q}_j\) fully. Also by our convention, the sign of $r_{i i}$ determined.





Reference

[1] Numerical Linear Algebra, Trefethen and Bau, 1997.
[2] Contemporary Linear Algebra, Howard Anton, Robert C. Busby

Leave a comment