Norms and Condition Numbers

From PrattWiki
Jump to navigation Jump to search

Introduction

This is a bit of a sandbox page with information about norms and conditions numbers. The contents will likely find new homes over time.

Norms

Norms basically calculate a "size" of a 1-d vector or a 2-d matrix. Note that some "vector norms" are calculated different from "matrix norms." Specifically, the 2-norm for a vector is VERY different from the 2-norm of a matrix. At the heart of the different types of norm are the following questions:

  • What is larger, 6 or 8? Generally the answer would be 8 since you compare the magnitudes of the two numbers.
  • What is larger, 6 or -8? Remember that a max function would say 6 since it is closer to +\(\infty\) but, really, -8 has a larger magnitude. That is an important distinction.
  • What is larger, [10, 10] or [1, 12]? The answer is - it depends on how we define the size of a two-element array. If we care about the magnitudes of each component, [10, 10] is larger since the sum of the magnitudes is 20 versus 13. If we care about the distance from the origin, [10, 10] is larger because the square root of the sum of the squares of the elements 10 and 10 is larger than the square root of the sum of the squares of the elements 1 and 12. But if we only care about the biggest number in the array, [1, 12] is larger than [10, 10]. This is why there are different norms.
  • For a vector (i.e. a 1d array) there are generally two different types of norms:
    • p-norm \(||x||_p\) is \((\sum|x|^p)^{(1/p)}\)
    • \(\infty\) norm is the largest magnitude in the vector.
  • For a matrix (i.e. a 2d array) there are generally four different types of norms:
    • The 1-norm \(||A||_1\) is the largest 1-norm of each column of \(A\)
    • The \(\infty\)-norm \(||A||_{\infty}\) is the largest 1-norm of each row of \(A\). Read that again. The \(\infty\)-norm of a matrix is based on the 1-norm of each row
    • The Frobenius norm of a matrix is the square root of the sum of the squares of all the entries in the matrix; that is, \(\sqrt{\sum_i\sum_j|A_{ij}|^2}\)
    • The 2-norm of a matrix is...complicated. It is the square root of the largest eigenvalue of \(A^*A\). It's all good if that doesn't mean anything to you at the moment. Suffice it to say the 2-norm of a matrix is an overall measure of how large the entries in the matrix are and how they work with each other geometrically. One good thing to know is that the 2-norm of a matrix will always be the smallest of the norms of the matrix.

Condition Numbers

Condition numbers relate to the coefficient matrix of a linear system \(Ax=b\) and specifically look at \(A\) to see how the equations defined by that matrix relate to each other. If the equations in the matrix are for lines / planes / hyperplanes that are nearly perpendicular, the condition number will be small. If the equations define l/p/h that are similar to each other, the condition number will be large. The condition number for a matrix \(A\) is the product of the matrix norm of the coefficient matrix multiplied by the matrix norm of the inverse of the matrix. Given that, you have to decide which norm to use to calculate the condition number. The 2-norm is most commonly used because it looks at all the values in the matrix and how they relate to each other.

As noted in Chapra, the base-10 log of the condition number gives an upper-bound on the number of digits of precision you will lost simply because of the model equations. What that means is the following - imagine you have a set of equations:

\( \begin{align} x+y&=2\\ x-y&=4 \end{align} \)

You can use a variety of methods to find that \(x\) is 3 and \(y\) is -1. But how accurate or those solutions? Part depends on how accurate your measurements were to come up with the equations in the first place. How accurate are the 1's in front of the variables? How accurate are the measurements 2 and 4? Your answers cannot be more accurate than those! But then - based on how mutually perpendicular your equations are, you may end up losing digits of precision. For the system above, assuming we use the Frobenius norm:

\( \begin{align} A&=\begin{bmatrix}1&1\\1&-1\end{bmatrix}\\ ||A||_{fro}&=\sqrt{1^2 + 1^2 + 1^2 + (-1)^2}=2\\ A^{-1}&=\begin{bmatrix}\frac{1}{2}&\frac{1}{2}\\\frac{1}{2}&-\frac{1}{2}\end{bmatrix}\\ ||A^{-1}||_{fro}&=\sqrt{0.5^2 + 0.5^2 + 0.5^2 + (-0.5)^2}=1\\ \mbox{cond(A,'fro')}&=||A||_{fro}||A^{-1}||_{fro}=2\\ \mbox{log10(cond(A, 'fro'))}&=0.301 \end{align} \)

meaning the solutions accuracy is between 0 and 1 digits fewer than the measurement accuracy.