Matrices I

1 Motivation

Linear algebra pops up almost everywhere in physics, so the matrix-related techniques developed below will be used repeatedly in later lectures. As a result, we will spend lots of time on matrices. We will take the time to introduce several numerical techniques in detail.

1.1 Examples from Physics

We discuss some elementary examples from undergraduate physics.

Rotations in two dimensions

Consider a two-dimensional Cartesian coordinate system. A point \boldsymbol{r} = (x,y)^T can be rotated counter-clockwise through an angle \theta about the origin, producing a new point \boldsymbol{r}' = (x',y')^T. The two points’ coordinates are related as follows: \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} x' \\ y' \end{pmatrix} The 2\times 2 matrix appearing here is an example of a rotation matrix in Euclidean space. If you know \boldsymbol{r}' and wish to calculate \boldsymbol{r}, you need to solve this system of two linear equations.

Electrostatic potentials

Assume you have n electric charges q_j (which are unknown) held at the positions \boldsymbol{R}_j (which are known). Further assume that you have measured the electric potential \phi(\boldsymbol{r}_i) at the n known positions \boldsymbol{r}_i. From the definition of the potential (as well as the fact that the potential obeys the principle of superposition), we see that: \phi(\boldsymbol{r}_i) = \sum_{j=0}^{n-1}\left(\frac{k q_j}{|\boldsymbol{r}_i - \boldsymbol{R}_j|}\right), where i = 0,1,\dots,n-1. If you assume you have four charges, the above relation turns into the following 4\times 4 linear systems of equations: \begin{pmatrix} k/|\boldsymbol{r}_0 - \boldsymbol{R}_0| &k/|\boldsymbol{r}_0 - \boldsymbol{R}_1| &k/|\boldsymbol{r}_0 - \boldsymbol{R}_2| &k/|\boldsymbol{r}_0 - \boldsymbol{R}_3| \\ k/|\boldsymbol{r}_1 - \boldsymbol{R}_0| &k/|\boldsymbol{r}_1 - \boldsymbol{R}_1| &k/|\boldsymbol{r}_1 - \boldsymbol{R}_2| &k/|\boldsymbol{r}_1 - \boldsymbol{R}_3| \\ k/|\boldsymbol{r}_2 - \boldsymbol{R}_0| &k/|\boldsymbol{r}_2 - \boldsymbol{R}_1| &k/|\boldsymbol{r}_2 - \boldsymbol{R}_2| &k/|\boldsymbol{r}_2 - \boldsymbol{R}_3| \\ k/|\boldsymbol{r}_3 - \boldsymbol{R}_0| &k/|\boldsymbol{r}_3 - \boldsymbol{R}_1| &k/|\boldsymbol{r}_3 - \boldsymbol{R}_2| &k/|\boldsymbol{r}_3 - \boldsymbol{R}_3| \end{pmatrix} \begin{pmatrix} q_0 \\ q_1 \\ q_2 \\ q_3 \end{pmatrix} = \begin{pmatrix} \phi(\boldsymbol{r}_0) \\ \phi(\boldsymbol{r}_1) \\ \phi(\boldsymbol{r}_2) \\ \phi(\boldsymbol{r}_3) \end{pmatrix} which needs to be solved for the 4 unknowns q_0, q_1, q_2 and q_3.

Principle moments of inertia

In study of the rotation of a rigid body about an arbitrary axis in three dimensions, you may have encountered the moment of inertia tensor: I_{\alpha \beta} = \int \rho(\boldsymbol{r}) \left(\delta_{\alpha \beta}r^2 - \boldsymbol{r}_\alpha \boldsymbol{r}_\beta\right)d^3 r, where \rho(\boldsymbol{r}) is the mass density, \alpha and \beta denote Cartesian components, and \delta_{\alpha \beta} is the Kronecker delta.

The moment of inertia tensor is represented by a 3\times 3 matrix: \boldsymbol{I} = \begin{pmatrix} I_{xx} & I_{xy} & I_{xz} \\ I_{yx} & I_{yy} & I_{yz} \\ I_{zx} & I_{zy} & I_{zz}. \end{pmatrix} This is a symmetric matrix. It is possible to choose a coordinate system such that the off-diagonal elements vanish. This axes of this coordinate system are known as the principal axes for the body at the origin. Then the moment of inertian tensor is represented by a diagonal matrix, with diagonal elements I_0, I_1, and I_2, known as the principal moments. This is an instance of the “eigenvalue problem”.

1.2 The problems to be solved

First, we look at the problem where we have n unknowns x_i, along with n\times n coefficients A_{ij} and n constants b_i: \begin{pmatrix} A_{00} & A_{01} & \dots & A_{0,n-1} \\ A_{10} & A_{11} & \dots & A_{1,n-1} \\ \vdots & \vdots & \ddots & \vdots \\ A_{n-1,0} & A_{n-1,1} & \dots & A_{n-1,n-1} \end{pmatrix} \begin{pmatrix} x_0 \\ x_1 \\ \vdots \\ x_{n-1} \end{pmatrix} = \begin{pmatrix} b_0 \\ b_1 \\ \vdots \\ b_{n-1} \end{pmatrix} where we used a comma to separate two indices when this was necessary to avoid confusion. These are n equations linear in n unknowns.

In compact matrix form, this problem is written as \boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}, where \boldsymbol{A} is called the coefficient matrix. This is a problem that we will spend considerable time solving in this lecture. We will be doing this mainly by using the augmented coefficient matrix which places together the elements of \boldsymbol{A} and \boldsymbol{b}, i.e.: (\boldsymbol{A}|\boldsymbol{b})= \left( \begin{matrix} A_{00} & A_{01} & \dots & A_{0,n-1} \\ A_{10} & A_{11} & \dots & A_{1,n-1} \\ \vdots & \vdots & \ddots & \vdots \\ A_{n-1,0} & A_{n-1,1} & \dots & A_{n-1,n-1} \end{matrix}\right| \left. \begin{matrix} b_0 \\ b_1 \\ \vdots \\b_{n-1} \end{matrix} \right). For now we assume the determinant of \boldsymbol{A} satisfy |\boldsymbol{A}| \neq 0.

In a course on linear algebra you have seen examples of legitimate operations one can carry out while solving the system of linear equations. Such operations change the elements of \boldsymbol{A} and \boldsymbol{b}, but leave the solution vector \boldsymbol{x} unchanged. More generally, we are allowed to carry the following elementary row operations:

Scaling: each row/equation may be multiplied by a constant (multiplies |\boldsymbol{A}| by the same constant).
Pivoting: two rows/equations may be interchanged (changes sign of |\boldsymbol{A}|).
Elimination: a row/equation may be replaced by a linear combination of that row/equation with any other row/equation (doesn’t change |\boldsymbol{A}|).

Keep in mind that these are operations that are carried out on the augmented coefficient matrix (\boldsymbol{A}|\boldsymbol{b}).

Second, we wish to tackle the standard form of the matrix eigenvalue problem: \boldsymbol{A}\boldsymbol{v} = \lambda \boldsymbol{v}. \tag{1} Here, both \lambda and the column vector \boldsymbol{v} are unknown. This \lambda is called an eigenvalue and \boldsymbol{v} is called an eigenvector.

Let’s sketch one possible approach to solve this problem. We can move everything to the left-hand side, we have (\boldsymbol{A} - \lambda \boldsymbol{I})\boldsymbol{v} = \boldsymbol{0}, where \boldsymbol{I} is the n\times n identity matrix and \boldsymbol{0} is an n\times 1 column vector made up of 0s. It is easy to see that we are faced with a system of n linear equations: the coefficient matrix here is A - \lambda \boldsymbol{I}.

The trivial solution is \boldsymbol{v} = 0. In order for a non-trivial solution to exist, we must have vanishing determinant |\boldsymbol{A} - \lambda \boldsymbol{I}| = 0. In other words, the matrix \boldsymbol{A} - \lambda \boldsymbol{I} is singular. Expanding the determinant gives us a polynomial equation, known as the characteristic equation: (-1)^n\lambda^n + c_{n-1} \lambda^{n-1} + \cdots + c_1 \lambda + c_0 = 0.

Thus, an n \times n matrix has at most n distinct eigenvalues, which are the roots of the characteristic polynomial. When a root occurs twice, we say that root has multiplicity 2. If a root occurs only once, in other words if it has multiplicity 1, we are dealing with a simple eigenvalue.

Having calculated the eigenvalues, one way to evaluate the eigenvectors is simply by using Equation 1 again.

Specifically, for a given/known eigenvalue, \lambda_i, one tries to solve the system of linear equations (\boldsymbol{A}-\lambda_i\boldsymbol{I})\boldsymbol{v}_i = 0 for \boldsymbol{v}_i.
For each value \lambda_i, we will not be able to determine unique values of \boldsymbol{v}_i, so we will limit ourselves to computing the relative values of the components of \boldsymbol{v}_i.
We will in the following use the notation (v_j)_0, (v_j)_1 etc. to denote the n elements of the column vector \boldsymbol{v}_j.

2 Error Analysis

We now turn to a discussion of practical error estimation in work with matrices. we will provide some general derivations and examples of when a problem is “well-conditioned”, typically by using matrix perturbation theory (i.e., by checking what happens if there are uncertainties in the input data).

After some preliminary comments, examples, and definitions, we will investigate quantitatively how linear systems, eigenvalues, and eigenvectors depend on the input data. We will be examining in each case the simplest scenario but, hopefully, this will be enough to help you grasp the big picture.

2.1 From a posteriori to a priori Estimates

Let us look at a specific 2\times 2 linear system, namely \boldsymbol{A}\boldsymbol{x} = \boldsymbol{b} for the case where (\boldsymbol{A}|\boldsymbol{b}) = \left( \begin{matrix} 0.2161 & 0.1441 \\ 1.2969 & 0.8648 \end{matrix}\ \right| \left. \begin{matrix} 0.1440 \\ 0.8642 \end{matrix} \right). \tag{2}

Simply put, there are two options on how to analyze errors:

an a priori analysis, in which case we try to see how easy/hard the problem is to solve before we begin solving it.
an a posteriori analysis, where we have produced a solution, and attempt to see how good it is.

Let us start with the latter option, an a posteriori approach. Say you are provided with the following approximate solution to the problem defined in Equation 2: \tilde{\boldsymbol{x}}^T = (0.9911 \quad -0.4870). \tag{3}

One way of testing how good a solution is, is to evaluate the residue vector: \boldsymbol{r} = \boldsymbol{b} - \boldsymbol{A} \tilde{\boldsymbol{x}}. Plugging in Equation 3, we find the residue vector \boldsymbol{r}^T = (-10^{-8} \quad 10^{-8}) which might naturally lead you to the conclusion that our approximate solution \tilde{\boldsymbol{x}} is pretty good!

However, here’s the thing: the exact solution to our problem is actually: \boldsymbol{x}^T = (-2 \quad 2). The approximate solution \tilde{\boldsymbol{x}} doesn’t contain even a single correct significant figure!

With the disclaimer that there’s much more that could be said at the a posteriori level, we now drop this line of attack and turn to an a priori analysis: could we have realized that solving the problem in Equation 2 was difficult? How could we know that there’s something pathological about it?

2.2 Magnitude of Determinant?

Example 1

In an attempt to see what is wrong with our previous example in Equation 2, we start to make small perturbation to the input data. Imagine we didn’t know the values of the coefficients in \boldsymbol{A} all that precisely. Would anything change?

Let us take \Delta \boldsymbol{A} = \begin{pmatrix} 0.0001 & 0 \\ 0 & 0 \end{pmatrix}

We want to study the effect of this perturbation on the solution, namely (\boldsymbol{A} + \Delta \boldsymbol{A})(\boldsymbol{x} + \Delta \boldsymbol{x}) = \boldsymbol{b}, where \boldsymbol{b} is kept fixed/unperturbed. For the specific case studied here, we find (using the following program) (\boldsymbol{x} + \Delta \boldsymbol{x})^T = (-2.31294091\times 10^{-4} \quad 0.99653059\times 10^{-1}).

We see that this is not a “small” effect. Our perturbation amounted to changing only one element of \boldsymbol{A} by less than 0.1\%, and had a dramatic impact on the solution to our problem.

import numpy as np
A = np.array([[0.2161, 0.1441],[1.2969, 0.8648]])
deltaA = np.array([[0.0001, 0],[0, 0]])
b = np.array([0.1440, 0.8642])
# we use np.linalg.solve(A,b) to solve the equation Ax = b.
x = np.linalg.solve(A,b)
x_dx = np.linalg.solve(A+deltaA,b)

print(x)
print(x_dx)

print("determinant of A is: ", np.linalg.det(A))

[ 2. -2.]
[-2.31294091e-04  9.99653059e-01]
determinant of A is:  -9.999999998544968e-09

Example 2

Let us look at the following example (\boldsymbol{A} | \boldsymbol{b}) = \left( \begin{matrix} 1 & 1 \\ 1 & 1.001 \end{matrix}\ \right| \left. \begin{matrix} 2 \\ 2.001 \end{matrix} \right). The exact solution is (from the following program) \boldsymbol{x}^T = (1 \quad 1). We can then add a perturbation \Delta\boldsymbol{A} = \begin{pmatrix} 0 & 0 \\ 0 & 0.001 \end{pmatrix}. Then, the perturbed solution is (\boldsymbol{x} + \Delta\boldsymbol{x})^T = (1.5 \quad 0.5).

Instead of adding a perturbation to \boldsymbol{A}, one can also add a perturbation to \boldsymbol{b}, with \Delta \boldsymbol{b}^T = (0 \quad 0.001). We find (in the following program) (\boldsymbol{x} + \Delta \boldsymbol{x})^T = (0 \quad 2).

import numpy as np
A = np.array([[1, 1],[1, 1.001]])
deltaA = np.array([[0, 0],[0, 0.001]])
b = np.array([2, 2.001])
deltab = np.array([0, 0.001])
# we use np.linalg.solve(A,b) to solve the equation Ax = b.
x = np.linalg.solve(A,b)
x_dx = np.linalg.solve(A+deltaA,b)
x_dx2 = np.linalg.solve(A,b+deltab)

print(x)
print(x_dx)
print(x_dx2)


print("determinant of A is: ", np.linalg.det(A))

[1. 1.]
[1.5 0.5]
[0. 2.]
determinant of A is:  0.00099999999999989

Example 3

There are also cases where small perturbations won’t lead to dramatic consequences in the solutions. See the following code.

import numpy as np
# A|b: 2   1 | 2
#      1   2 | 7
A = np.array([[2, 1],[1, 2]])
b = np.array([2,7])
# Delta A: 0    0
#          0.01 0
deltaA = np.array([[0, 0],[0.01, 0]])
# Delta b: 0.01
#          0
deltab = np.array([0.01, 0])
# we use np.linalg.solve(A,b) to solve the equation Ax = b.
x = np.linalg.solve(A,b)
x_dx = np.linalg.solve(A+deltaA,b)
x_dx2 = np.linalg.solve(A,b+deltab)

print(x)
print(x_dx)
print(x_dx2)


print("determinant of A is: ", np.linalg.det(A))

[-1.  4.]
[-1.00334448  4.00668896]
[-0.99333333  3.99666667]
determinant of A is:  2.9999999999999996

2.3 Norms for Matrices and Vectors

Example 4

Consider the following question: what does “small determinant” mean? If the definition is “much less than 1”, then one might counter-argue: what about the following matrix: \boldsymbol{A} = \begin{pmatrix} 0.2 & 0.1 \\ 0.1 & 0.2 \end{pmatrix}, which is just the matrix \boldsymbol{A} in Example 3 multiplied by 0.1. This matrix has a determinant \det(\boldsymbol{A}) = 0.03, which is certainly much less than 1. If you also multiply each element of \boldsymbol{b} in the previsou example by 0.1, you should get the same answer. What’s more, this linear system of equations should be equally sensitive to perturbations.

Thus, the value of the determinant should be compared with the magnitude of the relevant matrix elements.

Definitions and Properties for Matrices

Let us provide our intuitions with quantitative backing. We shall introduce the matrix norm, which measures the magnitude of \boldsymbol{A}. There are several possible definitions of a norm, but we will employ two possibilities.

Euclidean norm: \|\boldsymbol{A} \|_E = \sqrt{\sum_{i=0}^{n-1}\sum_{j=0}^{n-1} |A_{ij}|^2}, which is sometimes also called the Frobenius norm.
Infinity norm: \| \boldsymbol{A} \|_{\infty} = \max_{0\leq i \leq n-1} \sum_{j=0}^{n-1} |A_{ij}|, which is also known as the maximum row-sum norm.

Regardless of the specific norm definitions, all matrix norms for square matrices obey:

\|\boldsymbol{A} \| \geq 0
\| \boldsymbol{A} \| = 0 if and only if all A_{ij} = 0
\| k \boldsymbol{A} \| = |k| \|\boldsymbol{A}\|
\| \boldsymbol{A} + \boldsymbol{B} \| \leq \| \boldsymbol{A} \| + \| \boldsymbol{B} \|
\| \boldsymbol{A B} \| \leq \|\boldsymbol{A}\|\|\boldsymbol{B}\|

Now, we return to the question when the determinant is “small”. A reasonable definition would be |\det(\boldsymbol{A})| \ll \|\boldsymbol{A}\|.

A1 = np.array([[0.2161, 0.1441],[1.2969, 0.8648]])
A2 = np.array([[1, 1],[1, 1.001]])
A3 = np.array([[2, 1],[1, 2]])
A4= np.array([[0.2, 0.1],[0.1, 0.2]])

Alist = [A1, A2, A3, A4]

for ii,A in enumerate(Alist):
    print("Example", ii+1, ": det(A) =", np.linalg.det(A), ", Euclidean norm =", np.linalg.norm(A))

Example 1 : det(A) = -9.999999998544968e-09 , Euclidean norm = 1.5802824652573981
Example 2 : det(A) = 0.00099999999999989 , Euclidean norm = 2.000500187453128
Example 3 : det(A) = 2.9999999999999996 , Euclidean norm = 3.1622776601683795
Example 4 : det(A) = 0.03000000000000001 , Euclidean norm = 0.31622776601683794

These results seem to be consistent with what we had seen above:

Examples 1 and 2 are near-singular, while Example 3 is not singular.
For Example 4, this criterion claims that our matrix is not quite singular (though it’s getting there).

Our introduction of the concept of the matrix norm seems to have served its purpose: a small determinant needs to be compared to the matrix norm, so Example 4 (despite having a small determinant) is not singular, given that its matrix elements are small, too.

Definitions for Vectors

Let us also introduce norms for vector norms: \|\boldsymbol{x}\|_E = \sqrt{\sum_{i = 0}^{n-1} |x_i|^2}, \quad \|\boldsymbol{x} \|_{\infty} = \max_{0 \leq i \leq n-1} |x_i|.

2.4 Condition Number for Linear Systems

Unfortunately, our criterion |\det(\boldsymbol{A})|\ll \| \boldsymbol{A} \| is flawed, though this appears in many textbooks. We will look at two examples.

Example 5

We shall consider \boldsymbol{A} = \begin{pmatrix} 2 \times 10^{-10} & 1 \times 10^{-10} \\ 1 \times 10^{-10} & 2 \times 10^{-10} \end{pmatrix}, which is the matrix in Example 3 multiplied by 10^{-10}. Here, we have |\det(\boldsymbol{A})| = 3 \times 10^{-20}, and \|\boldsymbol{A} \|_E \simeq 3.16 \times 10^{-10}, so |\det(\boldsymbol{A})| \ll \|\boldsymbol{A} \|_E holds.

But isn’t this strange? Simply multiplying a set of equations with a small number cannot be enough to make the problem near-singular.

Example 6

Let us look at the following 8 \times 8 problem: \boldsymbol{A} = \begin{pmatrix*}[r] 2 & -2 & -2 & \cdots & -2 \\ 0 & 2 & -2 & \cdots & -2 \\ 0 & 0 & 2 & \cdots & -2 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 2 \end{pmatrix*} The corresponding results are |\det(\boldsymbol{A})| = 256, and \|\boldsymbol{A} \|_E = 12, so |\det(\boldsymbol{A})| \gg \|\boldsymbol{A} \|_E holds.

Now, take a look at the following code.

import numpy as np
A = np.array([[2, -2, -2, -2, -2, -2, -2, -2],
              [0,  2, -2, -2, -2, -2, -2, -2],
              [0,  0,  2, -2, -2, -2, -2, -2],
              [0,  0,  0,  2, -2, -2, -2, -2],
              [0,  0,  0,  0,  2, -2, -2, -2],
              [0,  0,  0,  0,  0,  2, -2, -2],
              [0,  0,  0,  0,  0,  0,  2, -2],
              [0,  0,  0,  0,  0,  0,  0,  2]])
b = np.array([1, -1, 1, -1, 1, -1, 1, -1])


deltaA = np.zeros((8,8))
deltaA[-1,0] = -0.01  # bottom-left elememnt is -0.01

# we use np.linalg.solve(A,b) to solve the equation Ax = b.
x = np.linalg.solve(A,b)
x_dx = np.linalg.solve(A+deltaA,b)

print(x)
print(x_dx)


print("determinant of A is: ", np.linalg.det(A))
print("norm of A is: ", np.linalg.norm(A))

[-21.  -11.   -5.   -3.   -1.   -1.    0.   -0.5]
[-30.88235294 -15.94117647  -7.47058824  -4.23529412  -1.61764706
  -1.30882353  -0.15441176  -0.65441176]
determinant of A is:  255.99999999999994
norm of A is:  12.0

Derivation

In the present subsection, we will carry out an informal derivation that will point us toward a quantitative measure of ill-conditioning. This measure of the sensitivity of our problem to small changes in its elements will be called the condition number.

Let us start with the unperturbed problem \boldsymbol{A}\boldsymbol{x} = \boldsymbol{b} and the perturbed one (\boldsymbol{A} + \Delta \boldsymbol{A})(\boldsymbol{x} + \Delta \boldsymbol{x}) = \boldsymbol{b}.

Combining the above two equations, we have \boldsymbol{A}\Delta \boldsymbol{x} = -\Delta \boldsymbol{A}(\boldsymbol{x} + \Delta \boldsymbol{x}). Assuming \boldsymbol{A} is nonsingular (so you can invert it), we get \Delta \boldsymbol{x} = -\boldsymbol{A}^{-1}\Delta \boldsymbol{A} (\boldsymbol{x} + \Delta \boldsymbol{x}). Now, we take the norm on both sides, we obtain \| \Delta \boldsymbol{x}\| = \| \boldsymbol{A}^{-1}\Delta \boldsymbol{A} (\boldsymbol{x} + \Delta \boldsymbol{x})\| \leq \| \boldsymbol{A}^{-1} \| \| \Delta \boldsymbol{A} \| \| \boldsymbol{x}+ \Delta \boldsymbol{x} \|.

This means \frac{\| \Delta \boldsymbol{x} \|}{\|\boldsymbol{x} + \Delta \boldsymbol{x} \|} \leq \| \boldsymbol{A}^{-1} \| \|\Delta \boldsymbol{A} \| = \| \boldsymbol{A}\|\| \boldsymbol{A}^{-1} \| \frac{\|\Delta \boldsymbol{A} \|}{\| \boldsymbol{A}\|}.

In other words, if you know an error bound on \|\Delta \boldsymbol{A} \|/ \|\boldsymbol{A} \| then translates to an error bound on \|\Delta \boldsymbol{x}\|/\|\boldsymbol{x}\| \simeq \|\Delta \boldsymbol{x}\|/\|\boldsymbol{x} + \Delta \boldsymbol{x}\|.

This leads to the introduction of the condition number: \kappa(\boldsymbol{A}) = \| \boldsymbol{A}\|\| \boldsymbol{A}^{-1} \|, which determines if a small perturbation gets amplified when solving for \boldsymbol{x} or not.

A large condition number leads to an amplification of a small perturbation: we say we are dealing with an ill-conditioned problem. If the condition number is of order unity, then a small perturbation is not amplified, so we are dealing with a well-conditioned problem.

Examples

Example 1: \kappa(\boldsymbol{A}) = 249729267.388, ill-conditioned.
Example 2: \kappa(\boldsymbol{A}) = 4002.001, ill-conditioned.
Example 3: \kappa(\boldsymbol{A}) = 3.33, well-conditioned
Example 4: \kappa(\boldsymbol{A}) = 3.33, well-conditioned
Example 5: \kappa(\boldsymbol{A}) = 3.33, well-conditioned
Example 6: \kappa(\boldsymbol{A}) = 512.18, ill-conditioned

2.5 Condition Number for Simple Eigenvalues

Having studied the sensitivity of a linear system to small perturbations, the natural place to go from there is to carry out an analogous study for the eigenvalue problem. For now, let us focus on the effect of the perturbations on the evaluation of the eigenvalues, \lambda.

Example 7

Take \boldsymbol{A} = \begin{pmatrix} 4 & 3 & 2 & 1 \\ 3 & 3 & 2 & 1 \\ 0 & 2 & 2 & 1 \\ 0 & 0 & 1 & 1 \end{pmatrix}. You can evaluate its eigenvalues using the following python code, with np.linalg.eigvals().

We now add a perturbation 0.01 to the top right element. It is easy to see that the smaller the eigenvalue, the bigger the impact of the small perturbation.

You may be thinking that this specific matrix may have a large condition number \kappa(\boldsymbol{A}), which is indeed large as \kappa(\boldsymbol{A})\simeq 126.744.

import numpy as np

A = np.array([[4,3,2,1],[3,3,2,1],[0,2,2,1],[0,0,1,1]],dtype=np.float64)

print('Eigenvalues of A:')
print(np.sort(np.linalg.eigvals(A)))

print('======================')

print('Condition number of A:')
print(np.linalg.norm(A)*np.linalg.norm(np.linalg.inv(A)))

print('======================')

A[0,-1] = A[0,-1] +  0.01
print('Eigenvalues of A + dA:')
print(np.sort(np.linalg.eigvals(A)))

Eigenvalues of A:
[0.13674761 0.48387934 2.06663092 7.31274213]
======================
Condition number of A:
126.74383614203887
======================
Eigenvalues of A + dA:
[0.12477715 0.49937427 2.06287308 7.3129755 ]

Example 8

Let look at another example with a smaller condition number. However, its eigenvalues are more sensitive to perturbations. In the following code.

import numpy as np
A = np.array([[4, 4, 0, 0],
              [0, 3, 4, 0],
              [0, 0, 2, 4],
              [0, 0, 0, 1]],dtype=np.float64)

print('Eigenvalues of A:')
print(np.sort(np.linalg.eigvals(A)))

print('======================')

print('Condition number of A:')
print(np.linalg.norm(A)*np.linalg.norm(np.linalg.inv(A)))

print('======================')

# Perturb the bottom left one
A[-1,0] = A[-1,0] +  0.005
print('Eigenvalues of A + dA:')
print(np.sort(np.linalg.eigvals(A)))

Eigenvalues of A:
[1. 2. 3. 4.]
======================
Condition number of A:
40.129477943277564
======================
Eigenvalues of A + dA:
[0.95115768 2.18205744 2.81794256 4.04884232]

Derivation

We saw that some matrices have eigenvalues that are sensitive to small perturbations, whereas others do not. In the present subsection, we will carry out an informal derivation that will point us toward a quantitative measure of conditioning eigenvalues (this is not \kappa(\boldsymbol{A})). This quantitative measure of the sensitivity of our problem to small changes in the input matrix elements will be called the condition number for simple eigenvalues; “simple” means we don’t have repeated eigenvalues.

Let us start with the unperturbed problem \boldsymbol{A}\boldsymbol{v}_i = \lambda_i \boldsymbol{v}_i, \tag{4} and the perturbed problem (\boldsymbol{A} + \Delta\boldsymbol{A})(\boldsymbol{v}_i + \Delta \boldsymbol{v}_i) = (\lambda_i + \Delta \lambda_i)( \boldsymbol{v}_i + \Delta \boldsymbol{v}_i). \tag{5}

We do not assume \boldsymbol{A} is symmetric, so the “eigenvectors” should actually be called right eigenvectors. We can also introduce the left eigenvectors as follows \boldsymbol{u}_i^T \boldsymbol{A} = \lambda_i\boldsymbol{u}_i^T. Note that more generally we should take the hermitian conjugate/conjugate-transpose \dagger, but this is not important here as \boldsymbol{A} is real.

Notice that the left eigenvectors of \boldsymbol{A} are actually right eigenvectors of \boldsymbol{A}^T, namely \boldsymbol{A}^T\boldsymbol{u}_i = \lambda_i \boldsymbol{u}_i. In fact, \boldsymbol{A}^T and \boldsymbol{A} have the same eigenvalues. In other words, left and right eigenvectors have the same eigenvalues (homework).

Now, combining Equation 4 and Equation 5, and dropping terms of second order in \Delta, we obtain \boldsymbol{A}\Delta \boldsymbol{v}_i + \Delta \boldsymbol{A} \boldsymbol{v}_i = \lambda_i \Delta \boldsymbol{v}_i + \Delta \lambda_i \boldsymbol{v}_i.

Multiplying the last equation with \boldsymbol{u}_i^T on the left, we obtain \boldsymbol{u}_i^T \boldsymbol{A} \Delta \boldsymbol{v}_i + \boldsymbol{u}_i^T \Delta \boldsymbol{A} \boldsymbol{v}_i = \lambda_i \boldsymbol{u}_i^T \Delta \boldsymbol{v}_i + \Delta \lambda_i \boldsymbol{u}_i^T \boldsymbol{v}_i.

The first term on the LHS and the first term on the RHS cancel, which gives rise to \boldsymbol{u}_i^T \Delta \boldsymbol{A} \boldsymbol{v}_i = \Delta \lambda_i \boldsymbol{u}_i^T \boldsymbol{v}_i.

Thus, we obtain |\Delta \lambda_i| = \frac{|\boldsymbol{u}_i^T \Delta \boldsymbol{A} \boldsymbol{v}_i|}{|\boldsymbol{u}_i^T \boldsymbol{v}_i|} \leq \frac{1}{|\boldsymbol{u}_i^T \boldsymbol{v}_i|} \|\Delta \boldsymbol{A}\|, where in the last inequality we have used the fact |\boldsymbol{u}_i^T \Delta \boldsymbol{A} \boldsymbol{v}_i|\leq \|\boldsymbol{u}_i\| \|\Delta \boldsymbol{A}\| \|\boldsymbol{v}_i\| = \|\Delta \boldsymbol{A}\|, with the assumption \|\boldsymbol{u}_i\| = \|\boldsymbol{v}_i\| = 1.

Thus, we are led to introduce a new condition number for simple eigenvalues: \kappa_{ev}^{\lambda_i}(\boldsymbol{A}) = \frac{1}{|\boldsymbol{u}_i^T \boldsymbol{v}_i|}, where the subscript is there to remind us that this is a condition number for a specific problem: for the evaluation of eigenvalues. The superscript keeps track of which specific eigenvalue’s sensitivity we are referring to.

Thus, we have the bound |\Delta \lambda_i| \leq \kappa_{ev}^{\lambda_i}(\boldsymbol{A}) \|\Delta \boldsymbol{A}\|

To calculate the condition number for a given eigenvalue you first have to calculate the product of the corresponding left- and right-eigenvectors.

Remarks

Real symmetric (or complex hermitian) matrices are very common in physics. For these matrices, the left- and right-eigenvector are the same and thus we have \kappa_{ev}^{\lambda_i}(\boldsymbol{A}) = 1.

Hence, for these matrices the eigenvalue problems is always well-conditioned.

2.6 Sensitivity of eigenvectors

Derivation

Let us now derive the quantative measure of conditioning eigenvectors. Let us start with \boldsymbol{A}\Delta \boldsymbol{v}_i + \Delta \boldsymbol{A} \boldsymbol{v}_i = \lambda_i \Delta \boldsymbol{v}_i + \Delta \lambda_i \boldsymbol{v}_i. \tag{6}

We now expand the perturbation in the eigenvector in terms of the other eigenvectors \Delta \boldsymbol{v}_i = \sum_{j\neq i} t_{ji} \boldsymbol{v}_j.

Here, the coefficients t_{ji} are to be determined. We are employing here the linear independence of the eigenvectors. Note that this sum does not include a j=i term. This is because, for \boldsymbol{v}_i + \Delta \boldsymbol{v}_i, if \Delta \boldsymbol{v}_i has a component of \boldsymbol{v}_i, then we can simply combine this component with \boldsymbol{v}_i, which means we simply adjust the coefficient in front \boldsymbol{v}_i.

Now, if we plug this expansion into Equation 6, we obtain \sum_{j\neq i}(\lambda_j - \lambda_i) t_{ji} \boldsymbol{v}_j + \Delta \boldsymbol{A} \boldsymbol{v}_i = \Delta \lambda_i \boldsymbol{v}_i.

Multiplying \boldsymbol{u}_k^T from the left, and using the fact the left- and right-eigenvectors for different eigenvalues are orthogonal, namely \boldsymbol{u}_k^T \boldsymbol{v}_i = 0 for k\neq i, we find (\lambda_k - \lambda_i) t_{ki} \boldsymbol{u}_k^T\boldsymbol{v}_k + \boldsymbol{u}_k^T \Delta \boldsymbol{A} \boldsymbol{v}_i = 0, and this means t_{ki} = \frac{\boldsymbol{u}_k^T \Delta \boldsymbol{A} \boldsymbol{v}_i}{(\lambda_i - \lambda_k)\boldsymbol{u}_k^T \boldsymbol{v}_k}.

Thus, we obtain \Delta \boldsymbol{v}_i = \sum_{j\neq i } \frac{\boldsymbol{u}_j^T \Delta \boldsymbol{A} \boldsymbol{v}_i}{(\lambda_i - \lambda_j)\boldsymbol{u}_j^T \boldsymbol{v}_j} \boldsymbol{v}_j.

We see that the numerator contains the perturbation \Delta \boldsymbol{A}.
The denominator contains two distinct contributions. (1) a \boldsymbol{u}_j^T \boldsymbol{v}_j term, which is the same thing when we define the condition number for a simple eigenvalue. (2) a (\lambda_i - \lambda_j) term, which is the separation between \lambda_i and other eigenvalues.

Example 9

See the following code. We see that while the eigenvalues are well-conditioned, they happen to be very closely spaced, which is a small number in the denominator, and thus makes the eigenvectors sensitive to perturbation.

import numpy as np
A = np.array([[1.01, 0.01],
              [0,    0.99]])

# Find eigenvalues and eigenvectors
print('Before perturbation:')
ee,ev = np.linalg.eig(A) 
for jj in range(len(ee)):
    print('Eigenvalue', ee[jj], 'with eigenvector', ev[:,jj])
    print('====================================================')


print('After perturbation:')
A[0,] += 0.005
ee,ev = np.linalg.eig(A) 
for jj in range(len(ee)):
    print('Eigenvalue', ee[jj], 'with eigenvector', ev[:,jj])
    print('====================================================')

Before perturbation:
Eigenvalue 1.01 with eigenvector [1. 0.]
====================================================
Eigenvalue 0.99 with eigenvector [-0.4472136   0.89442719]
====================================================
After perturbation:
Eigenvalue 1.01 with eigenvector [1. 0.]
====================================================
Eigenvalue 0.99 with eigenvector [-0.6  0.8]
====================================================

3 Homework

Prove that left and right eigenvalues are equivalent.
Compute the eigenvalues and eigenvectors (using np.linalg.eig()) of \boldsymbol{A} = \begin{pmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 0 & 0 & 4.001 \end{pmatrix}. We then introduce a perturbation, which adds 0.005 to the bottom left element. Please compute eigenvalues and eigenvectors for the perturbed matrix. Please also calculate the condition number for each eigenvalues (using python).