Vector Norms

Vector & Matrix Norms

5 minute read

Linear Algebra for AI & ML | Full Course Videos

Vector Norm

It is a measure of the size of a vector or distance from the origin.
Vector norm is a function that maps a vector to a real number, i.e, \({\| \cdot \|} : \mathbb{R}^n \rightarrow \mathbb{R}\).
Vector Norm should satisfy following 3 properties:

Non-Negativity:
Norm is always greater than or equal to zero,
\( {\| x \|} \ge 0\), and \( {\| x \|} = 0\), if and only if \(\vec{x} = \vec{0}\).
Homogeneity (or Scaling):
\( {\| \alpha x \|} = |\alpha| {\| x \|} \).
Triangle Inequality:
\( {\| x + y \|} \le {\| x \|} + {\| y \|} \).

P-Norm:
It is a generalised form of most common family of vector norms, also called Minkowski norm.
It is defined as:

\[ {\| x \|}_p = (\sum_{i=1}^n |x_i|^p)^{1/p} \]

We can change the value of \(p\) to get different norms.

L1 Norm:
It is the sum of absolute values of all the elements of a vector; also known as Manhattan distance.
p=1:

\[ {\| x \|_1} = \sum_{i=1}^n |x_i| = |x_1| + |x_2| + ... + |x_n| \]

L2 Norm:
It is the square root of the sum of squares of all the elements of a vector; also known as Euclidean distance.
p=2:

\[ {\| x \|_2} = (\sum_{i=1}^n x_i^2)^{1/2} = \sqrt{x_1^2 + x_2^2 + ... + x_n^2} \]

L-\(\infty\) Norm:
It is the maximum of absolute values of all the elements of a vector; also known as Chebyshev distance.
p=\(\infty\):

\[ {\| x \|_\infty} = \max |x_i| = \lim_{p \to \infty} (\sum_{i=1}^n |x_i|^p)^{1/p}\]

Example

Let, vector \(\mathbf{x} = \begin{bmatrix} 3 \\ \\ -4 \end{bmatrix}\), then
\({\| x \|_1} = |3| + |-4| = 7\)
\({\| x \|_2} = \sqrt{3^2 + (-4)^2} = \sqrt{25} = 5\)
\({\| x \|_\infty} = max(|3|, |-4|) = max(3, 4) = 4\)

Matrix Norm

It is a function that assigns non-negative size or magnitude to a matrix.
Matrix Norm is a function that maps a matrix to a non-negative real number, i.e, \({\| \cdot \|} : \mathbb{R}^{m \times n} \rightarrow \mathbb{R}\)
It should satisfy following 3 properties:

Non-Negativity:
Norm is always greater than or equal to zero,
\( {\| x \|} \ge 0\), and \( {\| x \|} = 0\), if and only if \(\vec{x} = \vec{0}\).
Homogeneity (or Scaling):
\( {\| \alpha x \|} = |\alpha| {\| x \|} \).
Triangle Inequality:
\( {\| x + y \|} \le {\| x \|} + {\| y \|} \).

There are 2 types of matrix norms:

Element wise norms, e.g,, Frobenius norm
Vector induced norms Frobenius Norm:

Frobenius Norm:
It is equivalent to the Euclidean norm of the matrix if it were flattened into a single vector.
If A is a matrix of size \(m \times n\), then, Frobenius norm is defined as:

\[ {\| A \|_F} = \sqrt{\sum_{i=1}^m \sum_{j=1}^n a_{ij}^2} = \sqrt{Trace(A^TA)} = \sqrt{\sum_i \sigma_i^2}\]

\(\sigma_i\) is the \(i\)th singular value of matrix A.

Vector Induced Norm:
It measures the maximum stretching a matrix can apply when multiplied with a vector,
where the vector has a unit length under the chosen vector norm.

Matrix Induced by Vector P-Norm:
P-Norm:

\[ {\| A \|_p} = \max_{{\| x \|_p} =1} \frac{\| Ax \|_p}{\| x \|_p} \]

P=1 Norm:

\[ {\| A \|_1} = \max_{1 \le j \le n } \sum_{i=1}^m |a_{ij}| = \text{ max absolute column sum } \]

P=\(\infty\) Norm:

\[ {\| A \|_\infty} = \max_{1 \le i \le m } \sum_{j=1}^n |a_{ij}| = \text{ max absolute row sum } \]

P=2 Norm:
Also called Spectral norm, i.e, maximum factor by which the matrix can stretch a unit vector in Euclidean norm.

\[ {\| A \|_2} = \sigma_{max}(A) = \text{ max singular value of matrix } \]

Example

Let, matrix \(\mathbf{A} = \begin{bmatrix} a_{11} & a_{12} \\ \\ a_{21} & a_{22} \end{bmatrix}\), then find Frobenius norm.

\({\| A \|_F} = \sqrt{a_{11}^2 + a_{12}^2 + a_{21}^2 + a_{22}^2}\)

\(\mathbf{A}^T = \begin{bmatrix} a_{11} & a_{22} \\ \\ a_{12} & a_{22} \end{bmatrix}\)

=> \(\mathbf{A^TA} = \begin{bmatrix} a_{11} & a_{22} \\ \\ a_{12} & a_{22} \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} \\ \\ a_{21} & a_{22} \end{bmatrix} = \begin{bmatrix} a_{11}^2 + a_{12}^2 & a_{11}.a_{21} + a_{12}.a_{22} \\ \\ a_{21}.a_{11} + a_{22}.a_{12} & a_{21}^2 + a_{22}^2 \end{bmatrix} \)

Therefore, \(Trace(\mathbf{A^TA}) = a_{11}^2 + a_{12}^2 + a_{21}^2 + a_{22}^2\)

=> \({\| A \|_F} = \sqrt{Trace(A^TA)} = \sqrt{a_{11}^2 + a_{12}^2 + a_{21}^2 + a_{22}^2}\)
Let, matrix \(\mathbf{A} = \begin{bmatrix} 1 & -2 & 3 \\ \\ 4 & 5 & -6 \end{bmatrix}\), then

Column 1 absolute value sum = |1|+|4| = 5
Column 2 absolute value sum = |-2|+|5|= 7
Column 3 absolute value sum = |3|+|-6|= 9

Row 1 absolute value sum = |1|+|-2|+|3| = 6
Row 2 absolute value sum = |4|+|5|+|-6| = 15

\({\| A \|_1} = max(5,7,9) = 9\) = max column absolute value sum.
\({\| A \|_\infty} = max(6,15) = 15\) = max row absolute value sum.

Let, matrix \(\mathbf{A} = \begin{bmatrix} 2 & 1 \\ \\ 1 & 2 \end{bmatrix}\), then find spectral norm.
Spectral norm can be found using the singular value decomposition, in order to get the largest singular value.
\({\| A \|_2} = \sigma_{max}(A) \)

\(\mathbf{A} = \mathbf{U \Sigma V^T} \) , where \(\mathbf{U} = \mathbf{AA^T} \), \(\mathbf{V} = \mathbf{A^TA} \)

Let’s find the largest eigen value of \(\mathbf{A^TA} \), square root of which will give the largest singular value of \(\mathbf{A}\).

\( \mathbf{V} = \mathbf{A^TA} = \begin{bmatrix} 2 & 1 \\ \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 2 & 1 \\ \\ 1 & 2 \end{bmatrix} = \begin{bmatrix} 5 & 4 \\ \\ 4 & 5 \end{bmatrix} \)

Now, lets find the eigen values of the above matrix V:
\(det(V-\lambda I) = 0 \)

=> \( det\begin{bmatrix} 5 - \lambda & 4 \\ \\ 4 & 5- \lambda \end{bmatrix} = 0 \)

=> \( (5 - \lambda)^2 - 16 = 0 \)
=> \( (5 - \lambda) = \pm 4 \)
=> \( (5 - \lambda) = 4 \) or \( (5 - \lambda) = -4 \)
=> \( \lambda = 1 \) or \( \lambda = 9 \)
=> Largest Singular Value = Square root of largest eigen value = \(\sqrt{9} = 3\)
Therefore, \({\| A \|_2} = \sigma_{max}(A) = 3\)

Vector and Matrix Norms for Machine Learning | Manhattan | Euclidean | Frobenius | Explained

Previous: Vector & Matrix Calculus Next: Hyperplane

End of Section