[ML] K-Nearest Neighbors (KNN)

2 minute read

$K$ Nearest Neighbors; KNN

$K$ Nearest Neighbors Algorithm

The $K$-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems. The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other.

Consider a new blue data and the objective is to classify it.

When used in regression, the value of a new data z is given by the average of the values of its $K$ neighbors in $N$.

The KNN algorithm has to compute all distances between a new data and neighbor data, which results in high computational cost. There exist some algorithms to reduce the computational cost such as

Locality Sensitive Hashing
Network based Indexer
Optimized product quantization

$\mathbf{Example.}$ Decision boundary - Comparison with Different $K$

Distance metrics

Similarity can be measured in terms of distance. There are a number of different distance metrics as given below.

Euclidean Distance ($L_2$ Distance)

$X = (x_1, \cdots, x_n), Y = (y_1, \cdots, y_n)$

$d_{(X, Y)} = \sqrt{(x_1 - y_1)^2 + \cdots + (x_n - y_n)^2}$

Manhattan Distance ($L_1$ Distance)

$d_{\text{Manhattan}(X, Y)} = \sum_{i=1}^n |x_i - y_i|$

Mahalanobis Distance

It takes into account the covariances of the data set. For two vectors $\mathbf{x}$ and $\mathbf{y}$,

$d_{\text{Mahalanobis}(\mathbf{x}, \mathbf{y})} = \sqrt{ (\mathbf{x-y})^T \sum^{-1} (\mathbf{x-y}) }$

where $\Sigma$ denotes the covariance matrix of the data set.

To understand Mahalanobis distance, consider two points $x = (x_1, x_2)$ and $y = (0, 0)$ in $\mathbf{R}^2$. For $a_{12} = a_{21}$ and

$\Sigma^{-1} = \begin{bmatrix}a_{11} \ a_{12}\\a_{21} \ a_{22}\end{bmatrix}$

if $d_{\text{Mahalanobis}(\mathbf{x}, \mathbf{y})} = d$, then we have

$x_1^2 a_{11} + 2x_1 x_2 a_{12} + x_2^2 a_{22} = d^2$

Pearson Correlation Distance

For $\mathbf{x} = (x_1, \cdots, x_n)$ and $\mathbf{y} = (y_1, \cdots, y_n)$

$d_{\text{Corr}(\mathbf{x, y})} = 1 - \rho_{\mathbf{xy}} \in [0, 2]$

where $\overline{x} = \frac{1}{n} \sum_{i=1}^n x_i, \overline{y} = \frac{1}{n} \sum_{i=1}^n y_i,$ and

$\rho_{\mathbf{xy}} = \frac {\sum_{i=1}^n (x_i - \overline{x}) (y_i - \overline{y})} {\sum_{i=1}^n (x_i - \overline{x})^2 \sum_{i=1}^n (y_i - \overline{y})^2}$

Spearman’s Rank Correlation

$\rho_{\mathbf{xy}}^{(r)} = \frac {\sum_{i=1}^n \sum_{j=1}^n (r_j - r_i)(s_j - s_i)} {\sum_{i=1}^n \sum_{j=1}^n (r_j - r_i)^2}$

where $r_i$ and $s_i$ denote the ranks of $x_i$ and $y_i$ in $\mathbf{x}$ and $\mathbf{y}$, respectively.

$\mathbf{Note.}$ When there are no ties, we see that

$\rho_{\mathbf{xy}}^{(r)} = 1 - \frac{6}{n(n^2-1)} \sum_{i=1}^n (r_i - s_i)^2$

$\sum_{i=1}^n \sum_{j=1}^n (r_j - r_i) (s_j - s_i)$
$= \sum_{i=1}^n \sum_{j=1}^n r_i s_i + \sum_{i=1}^n \sum_{j=1}^n r_j s_j - \sum_{i=1}^n \sum_{j=1}^n r_i s_j - \sum_{i=1}^n \sum_{j=1}^n r_j s_i$
$= 2n \sum_{i=1}^n r_i s_j - 2 \sum_{i=1}^n r_i \sum_{j=1}^n s_j$
$= 2n \sum_{i=1}^n r_i s_i - 2 (\frac{1}{2} n(n+1))^2$
$= 2n \sum_{i=1}^n r_i s_i - \frac{1}{2} n^2 (n+1)^2$

From $\sum_{i=1}^n (r_i - s_i)^2 = 2 \sum r_{i}^2 - 2 \sum r_i s_i$, we have
$\sum_{i=1}^n \sum_{j=1}^n (r_j - r_i) (s_j - s_i)$
$= 2n \sum_{i=1}^n r_{i}^2 - \frac{1}{2} n^2 (n+1)^2 - n \sum_{i=1}^n (r_i - s_i)^2$
$= \frac{1}{6} n^2 (n^2 - 1) - n \sum_{i=1}^n (r_i - s_i)^2$.

Further,

$\sum_{i=1}^n \sum_{j=1}^n (r_j - r_i)^2 = 2n \sum_{i=1}^n r_{i}^2 - 2 \sum_{i=1}^n \sum_{j=1}^n r_i r_j$
$= 2n \sum_{i=1}^n r_{i}^2 - 2 (\sum_{i=1}^n r_i)^2 = \frac{1}{6} n^2 (n^2 - 1)$

By substituting into the original formula these results, done $._\blacksquare$

Reference

[1] Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer 2006.
[2] Spearman’s rank correlation coefficient

Share on

Twitter Facebook LinkedIn

Youngdo Lee