4 minute read

Pseudoinverse

Motivation: Least Square Problem

Consider a linear system of equations having $n$ unknowns: $A \in \mathbb{C}^{m \times n}$. We want to find $\mathbf{x} \in \mathbb{C}^n$ such that $A \mathbf{x} = \mathbf{b}$ where $b \in \mathbb{C}^m$, however, such $\mathbf{x}$ exists only if $\mathbf{b} \in \text{range}(A)$. Thus, instead, we wish to minimize residual $\mathbf{r} = \mathbf{b} - A \mathbf{x}$ by suitable choice of $\mathbf{x}$, and it is called least square problem.

Given $A \in \mathbb{C}^{m \times n}, \mathbf{b} \in \mathbb{C}^m$, find $\mathbf{x} \in \mathbb{C}^n$ such that \(\| \mathbf{b} - A \mathbf{x} \|_2\) minimized.


To compute the solution, note that our goal is equivalent to finding the closest point $A \mathbf{x} \in \text{range}(A)$ to $\mathbf{b}$. In other words, we should find an orthogonal projector $P: \mathbb{C}^m \to \text{range}(A)$ such that

\[\begin{aligned} A \mathbf{x} = P\mathbf{b} \end{aligned}\]

image


so that $\mathbf{r} \perp \text{range}(A)$. And it equals to

\[\begin{aligned} \mathbf{r} \perp \text{range}(A) & \leftrightarrow A^\star \mathbf{r} = 0 \\ & \leftrightarrow A^\star A \mathbf{x} = A^\star \mathbf{b} \\ & \leftrightarrow A \mathbf{x} = P \mathbf{b} \end{aligned}\]

So, if $A$ has full column rank (so, it must be $m \geq n$.) and then $A^\star A$ is clearly non-singular, which implies the solution of least square problem $P$ is

\[\begin{aligned} P = (A^\star A)^{-1} A^\star \end{aligned}\]

For $(A^\star A)^{-1} A^\star$, we call it pseudoinverse of $A$. In this case we considered the special case of full column rank, but it is defined for any matrix and it describes generalized orthogonal projection to $\text{range}(A)$.

Definition

In the motivation section, we assume $A$ has full column rank, but in general pseudoiverse is defined by SVD (we will see another equivalent definition of pseudoinverse in the next section):

$\mathbf{Definition.}$ Pseudoinverse

Let $A$ be an arbitrary $m \times n$ matrix, i.e., $A \in \mathbb{C}^{m \times n}$. Then, we know there exists SVD of $A$, i.e., unitary $U \in \mathbb{C}^{m \times r}, V \in \mathbb{C}^{r \times n}$ and a diagonal matrix $\Sigma = \text{diag}(\sigma_1, \cdots, \sigma_r)$ where $A = U \Sigma V^\star$. Then, the $n \times m$ matrix

\[\begin{align*} A^+ = V \Sigma^{-1} U^\star \end{align*}\]

is called pseudoinverse, or Moore-Penrose generalized inverse of $A$.

And, if $A \in \mathbb{C}^{m \times n}$ has full column rank; in other words, $\text{rank}(A) = n$ with $m \geq n$, then the pseudoinverse is represented as

\[\begin{aligned} A^+ = (A^\star A)^{-1} A^\star \end{aligned}\]

It is because, since A has full column rank, then $V \in \mathbb{C}^{n \times n}$ is invertible where $V^{-1} = V^\star$. Thus, $(A^\star A) = V \Sigma^2 V^\star = \Sigma^2 \to (A^\star A)^{-1} A^\star = (V^\star)^{-1} \Sigma^{-1} U^\star = V \Sigma^{-1} U^\star$.

Uniqueness

To prove the uniqueness of pseudoinverse, we introduce some special identities described by Roger Penrose:

$\mathbf{Definition.}$ Penrose identities

A $m \times n$ matrix $A$ and $n \times m$ matrix $B$ satisfy the Penrose identities if $A$ and $B$ satisfy the following 3 identities:

(a) $(AB)^\star = AB$ and $(BA)^\star = BA$
(b) $ABA = A$
(c) $BAB = B$

And, it is known that for any matrix $A$, a matrix satisfying the Penrose identities for $A$ is unique.

$\mathbf{Thm\ 1.}$ The matrix satisfying the Penrose identities for $A$ is unique.

$\mathbf{Proof.}$

Suppose $B$ and $C$ satisfy the Penrose identities with regard to $A$. Then,

\[\begin{aligned} B & = BAB = B(AB)^\star = BB^\star A^\star = BB^\star (ACA)^\star = B (AB)^\star (AC)^\star \\ & = BABAC = BAC = A^\star B^\star C = A^\star C^\star A^\star B^\star C = A^\star C^\star BAC \\ & = CABAC = CAC = C._\blacksquare \end{aligned}\]


Hence, with this theorem, we can say that pseudoinverse of $A$ is unique as it satisfies Penrose identities.

$\mathbf{Thm\ 2.}$ Pseudoinverse of matrix $A \in \mathbb{C}^{m \times n}$ is unique.

Generalized Orthogonal Projection

Consider the least square problem again. Since $A^\star (A A^+ - I) = V \Sigma U^\star (U U^\star - I) = 0$,

\[\begin{aligned} \| A \mathbf{x} - \mathbf{b} \|_2^2 & = \| A (\mathbf{x} - A^+ \mathbf{b}) + (AA^+ \mathbf{b} - \mathbf{b}) \|_2^2 \\ & = \| A (\mathbf{x} - A^+ \mathbf{b}) \|_2^2 + 2 (\mathbf{x} - A^+ \mathbf{b})^\star A^\star (AA^+ - I)\mathbf{b} + \| AA^+ \mathbf{b} - \mathbf{b} \|_2^2 \\ & = \| A (\mathbf{x} - A^+ \mathbf{b}) \|_2^2 + \| AA^+ \mathbf{b} - \mathbf{b} \|_2^2 \\ & \geq \| AA^+ \mathbf{b} - \mathbf{b} \|_2^2 \end{aligned}\]

Thus, although we do not assume the full column rank of $A$, the solution of least square problem is $\mathbf{x} = A^+ \mathbf{b}$.

With analogous way, this fact is extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm.

$\mathbf{Thm\ 3. }$

\[\begin{aligned} A^{+}=\underset{X: d \times n \text { matrix }}{\operatorname{argmin}}\left\|A X-I_n\right\|_F=\underset{Y: d \times n \text { matrix }}{\operatorname{argmin}}\left\|Y A-I_d\right\|_F \end{aligned}\]
$\mathbf{Proof.}$

Note that \(X \mathbf{e}_j\) is a vector. Then, as we saw,

\[\begin{aligned} \|A X \mathbf{e}_j-\mathbf{e}_j\|_2 \geq\|A A^{+} \mathbf{e}_j-\mathbf{e}_j\|_2 \end{aligned}\]

Hence,

\[\begin{aligned} \left\|A X-I_n\right\|_F^2=\sum_{j=1}^n \| A X \mathbf{e}_j-\mathbf{e}_j \|_2^2 \geq \sum_{j=1}^n \|A A^{+} \mathbf{e}_j-\mathbf{e}_j \|_2^2=\left\|A A^{+}-I_n\right\|_F^2 \end{aligned}\]

And, analogously, set $A^- = (A^+)^\star$, and $A^\star A^- = VV^\star$. Thus, $A(A^\star A^- - I) = U \Sigma V^\star (V V^\star - I) = 0$. It implies

\[\begin{aligned} \|A^{\top} \mathbf{y}-\mathbf{c}\|_2^2 & =\|A^{\top}\left(\mathbf{y}-A^{-} \mathbf{c}\right)+\left(A^{\top} A^{-} \mathbf{c}-\mathbf{c}\right)\|_2^2 \\ & =\|A^{\top}\left(\mathbf{y}-A^{-} \mathbf{c}\right)\|_2^2+2\left(\mathbf{y}-A^{-} \mathbf{c}\right)^{\top} A\left(A^{\top} A^{-}-I\right) \mathbf{c}+\|A^{\top} A^{-} \mathbf{c}-\mathbf{c}|_2^2 \\ & =\|A^{\top}\left(\mathbf{y}-A^{-} \mathbf{c}\right)\|_2^2+0+\|A^{\top} A^{-} \mathbf{c}-\mathbf{c}\|_2^2 \\ & \geq\|A^{\top} A^{-} \mathbf{c}-\mathbf{c}\|_2^2 \end{aligned}\]

Finally, we can prove it with similar way in the previous case:

\[\begin{aligned} \left\|Y A-I_d\right\|_F^2=\sum_{i=1}^d\|\mathbf{e}_i^{\top} Y A-\mathbf{e}_i^{\top}\|_2^2 \geq \sum_{i=1}^d\|\mathbf{e}_i^{\top} A^{+} A-\mathbf{e}_i^{\top}\|_2^2=\left\|A^{+} A-I_d\right\|_F^2 \end{aligned}\]




Reference

[1] Lloyd N. Trefethen & David Bau III, Numerical Linear Algebra, 1997
[2] Wikipedia, Moore–Penrose inverse

Leave a comment