\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Linear Algebra Notes

Study notes of Introduction to Linear Algebra, 5th Edition by Gilbert Strang

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 1 Introduction to Vectors

Note

Linear combination: \(c\v + d \w = c\bb 1 \\ 1 \eb + d\bb 2 \\ 3 \eb = \bb c + 2d \\ c + 3d \eb\)

The vectors \(c\bs{v}\) lie along a line. When \(\bs{w}\) is not on that line, the combinations \(c\bs{v} + d\bs{w}\) fill the whole two-dimensional plane.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 1.1 Vectors and Linear Combinations

Column Vector \(\bs{v}\)

\[\begin{split}\v = \bb v_1 \\ v_2 \eb\end{split}\]

VECTOR ADDITION

\[\begin{split}\v = \bb v_1 \\ v_2 \eb \ \mathrm{and}\ \w = \bb w_1 \\ w_2 \eb \ \mathrm{add}\ \mathrm{to}\ \v + \w = \bb v_1 + w_1 \\ v_2 + w_2 \eb\end{split}\]

SCALAR MULTIPLICATION

\[\begin{split}2\v = \bb 2v_1 \\ 2v_2 \eb = \v + \v \\ -\v = \bb -v_1 \\ -v_2 \eb\end{split}\]

Linear Combinations

Note

The sum of \(c\v\) and \(d\w\) is a linear combination \(c\v + d\w\).

Four special linear combinations are: sum, difference, zero, and a scalar multiple \(c\v\):

  • \(1\v + 1\w =\) sum of vectors

  • \(1\v - 1\w =\) difference of vectors

  • \(0\v + 0\w =\) zero vectors

  • \(c\v + 0\w =\) vector \(\v\)

Note

Represent vector \(\v\): Two numbers, Arrow from (0, 0), Point in the plane

Tip

Visualization of \(\v + \w\) (head to tail): At the end of \(\v\), place the start of \(\w\).

Vectors in Three Dimensions

Note

From now on \(\v = \bb 1 \\ 1 \\ -1 \eb\) is also written as \(\v = (1, 1, -1)\).

The Important Questions

If \(\u, \v \ \mathrm{and}\ \w\) are typical nonzero vectors (components chosen at random)

  1. What is the picture of all combinations \(c\u\)?

    The combinations \(c\u\) fill a line through (0, 0, 0).

  2. What is the picture of all combinations \(c\u + d\v\)?

    The combinations \(c\u + d\v\) fill a plane through (0, 0, 0).

  3. What is the picture of all combinations \(c\u + d\v + e\w\)?

    The combinations \(c\u + d\v + e\w\) fill a three-dimensional space.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 1.2 Lengths and Dot Products

Note

The dot product or inner product of \(\v = (v_1, v_2)\) and \(\w = (w_1, w_2)\) is the number \(\v\cd\w\):

  • \(\v\cd\w = v_1w_1 + v_2w_2\).

The dot product of perpendicular vectors is zero.

Tip

The dot product \(\w\cd\v\) equals \(\v\cd\w\). The order of \(\v\) and \(\w\) makes no difference.

Main point: For \(\v\cd\w\), multiply each \(v_i\) times \(w_i\). Then \(\v\cd\w = v_1w_1 + \cds + v_nw_n\).

Lengths and Unit Vectors

Note

DEFINITION: The length \(\lv\v\rv\) of a vector \(\v\) is the square root of \(\v\cd\v\):

  • length \(=\lv\v\rv=\sqrt{\v\cd\v} = (v_1^2 + v_2^2 + \cds + v_n^2)^{1/2}\).

Note

DEFINITION: A unit vector \(\u\) is a vector whose length equals one. Then \(\u\cd\u = \bs{1}\).

The standard unit vectors along the x and y axes are written \(\i\) and \(\j\).

For a unit vector, divide any nonzero vector \(\v\) by its length \(\lv\v\rv\).

Note

Unit vector: \(\u = \v/\lv\v\rv\) is a unit vector in the same direction as \(\v\).

The Angle Between Two Vectors

Note

Right angles: The dot product is \(\v\cd\w = 0\) when \(\v\) is perpendicular to \(\w\).

Proof: When \(\v\) and \(\w\) are perpendicular, they form two sides of a right triangle. The third side is \(\v - \w\). The Pythagoras Law for the sides of a right triangle is \(a^2 + b^2 = c^2\).

Perpendicular vectors: \(\lv\v\rv^2 + \lv\w\rv^2 = \lv\v - \w\rv^2\)

Writing out the formulars for those lengths in two dimensions, this equation is Pythagoras:

\[\begin{split}(v_1^2 + v_2^2) + (w_1^2 + w_2^2) &= (v_1 - w_1)^2 + (v_2 - w_2)^2 \\ 0 &= -2v_1w_1 - 2v_2w_2\end{split}\]

which leads to \(\v_1\w_1 + \v_2\w_2 = \bs{0}\).

Conclusion: Right angles produce \(\v\cd\w = 0\). The dot product is zero when the angle is \(\theta = 90^\circ\). Then \(\cos\theta = 0\). The zero vector \(\v = \bs{0}\) is perpendicular to every vector \(\v\) because \(\bs{0}\cd\w\) is always zero.

The angle between \(\v\) and \(\w\) is

  • less than \(90^\circ\) when \(\v\cd\w\) is positive.

  • greater than \(90^\circ\) when \(\v\cd\w\) is negative.

Note

Unit vector \(\u\) and \(\bs{U}\) at angle \(\theta\) have \(\u\cd\bs{U} = \cos\theta\). Certainly \(|\u\cd\bs{U}| \leq 1\).

The dot product of unit vectors is between -1 and 1. The cosine of \(\theta\) is revealed by \(\u\cd\bs{U}\).

When the vectors are \(\u = (\cos\alpha, \sin\alpha)\) and \(\bs{U} = (1, 0)\), the dot product is \(\u\cd\bs{U} = \cos\theta\). That is the cosine of the angle between them.

After rotation through any angle \(\alpha\), these are still unit vectors. The vector \(\bs{U}\) rotates to \((\cos\alpha, \sin\alpha)\). The vector \(\u\) rotates to \((\cos\beta, \sin\beta)\) with \(\beta = \alpha + \theta\). Their dot product is \(\cos\alpha\cos\beta + \sin\alpha\sin\beta = \cos(\beta-\alpha) = \cos\theta\).

Note

COSINE FORMULA: If \(\v\) and \(\w\) are nonzero vectors, then \(\dp\frac{\v\cd\w}{\lv\v\rv\lv\w\rv} = \cos\theta\)

Since \(\cos\theta\) never exceeds 1, the cosine formula gives two great inequalities:

Note

SCHWARZ INEQUALITY: \(|\v\cd\w| \leq \lv\v\rv\lv\w\rv\)

TRIANGLE INEQUALITY: \(\lv\v+\w\rv \leq \lv\v\rv + \lv\w\rv\)

Tip

Geometric mean \(\leq\) Arithmetic mean

\(\dp ab \leq \frac{a^2 + b^2}{2}\), \(\dp \sqrt{xy} \leq \frac{x + y}{2}\)

Notes on Computing

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 1.3 Matrices

The linear combinations of the three vectors \(\u = (1, -1, 0)\), \(\v = (0, 1, -1)\), \(\w = (0, 0, 1)\) are \(x_1\u + x_2\v + x_3\w\):

\[\begin{split}x_1 \bb 1 \\ -1 \\ 0 \eb + x_2 \bb 0 \\ 1 \\ -1 \eb + x_3 \bb 0 \\ 0 \\ 1 \eb = \bb x_1 \\ x_2 - x_1 \\ x_3 - x_2 \eb.\end{split}\]

Rewrite that combination using a matrix:

Note

Matrix times vector, Combination of columns:

  • \(A\x = \bb 1 & 0 & 0 \\ -1 & 1 & 0 \\ 0 & -1 & 1 \eb \bb x_1 \\ x_2 \\ x_3 \eb = \bb x_1 \\ x_2 - x_1 \\ x_3 - x_2 \eb\)

The matrix \(A\) acts on the vector \(\x\). The output \(A\x\) is a combination \(\b\) of the columns of \(A\).

\[\begin{split}A\x = \bb 1 & 0 & 0 \\ -1 & 1 & 0 \\ 0 & -1 & 1 \eb \bb x_1 \\ x_2 \\ x_3 \eb = \bb \bs{x_1} \\ \bs{x_2} - \bs{x_1} \\ \bs{x_3} - \bs{x_2} \eb = \bb b_1 \\ b_2 \\ b_3 \eb = \b\end{split}\]

This \(A\) is a “difference matrix” because \(\b\) combinations difference of the input vector \(\x\).

\(A\x\) is also dot products with rows:

\[\begin{split}A\x = \bb 1 & 0 & 0 \\ -1 & 1 & 0 \\ 0 & -1 & 1 \eb \bb x_1 \\ x_2 \\ x_3 \eb = \bb (1,0,0)\cd(x_1,x_2,x_3) \\ (-1,1,0)\cd(x_1,x_2,x_3) \\ (0,-1,1)\cd(x_1,x_2,x_3) \eb.\end{split}\]

Linear combinations are the key to linear algebra, and the output \(A\x\) is a linear combination of the columns of \(A\).

Linear Equations

Now we think of \(\b\) as known and we look for \(\x\).

Old question: Compute the linear combination \(x_1\u + x_2\v + x_3\w\) to find \(\b\).

New question: Which combination of \(\u, \v, \w\) produces a particular vector \(\b\)?

Note

Equations \(A\x = \b\): \(\begin{matrix} x_1 = b_1 \\ -x_1+x_2=b_2 \\ -x_2+x_3=b_3 \end{matrix}\). Solution \(\x = A^{-1}\b\): \(\begin{matrix} x_1 = b_1 \\ x_2=b_1+b_2 \\ x_3=b_1+b_2+b_3 \end{matrix}\).

The equations can be solved in order (top to bottom) because \(A\) is a triangular matrix.

This matrix \(A\) is “invertible”. From \(\b\) we can recover \(\x\). We write \(\x\) as \(A^{-1}\b\).

The Inverse Matrix

\(A\x = \b\) is solved by

\[\begin{split}\bb x_1 \\ x_2 \\ x_3 \eb = \bb b_1 \\ b_1 + b_2 \\ b_1 + b_2 + b_3 \eb = \bb 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 1 \eb \bb b_1 \\ b_2 \\ b_3 \eb.\end{split}\]
  1. For every \(\b\) there is one solution to \(A\x = \b\).

  2. The matrix \(A^{-1}\) produces \(\x = A^{-1}\b\).

Note on calculus. The vector \(\x\) changes to a function \(x(t)\). The differences \(A\x\) become the derivative \(dx/dt = b(t)\). The sums \(A^{-1}\b\) become the integral of \(b(t)\). Sums of differences are like integrals of derivatives.

Fundamental Theorem of Calculus: integration is the inverse of differentiation.

\(\bs{Ax=b}\) and \(\bs{x=A^{-1}b}\):

\[\dp \frac{dx}{dt}=b \ \mathrm{and}\ x(t) = \int_{0}^{t}b\ dt.\]

Centered difference of \(\bs{x(t)=t^2}\):

\[\dp \frac{(t+1)^2 - (t-1)^2}{2} = 2t.\]

Cyclic Differences

Keeps the same columns \(\u\) and \(\v\) but changes \(\w\) to a new vector \(\w^*\):

\[\begin{split}\u = \bb 1 \\ -1 \\ 0 \eb \quad \v = \bb 0 \\ 1 \\ -1 \eb \quad \w^* = \bb -1 \\ 0 \\ 1 \eb\end{split}\]

The linear combinations of \(\u, \v, \w^*\) lead to a cyclic difference matrix \(C\):

Note

Cyclic: \(C\x = \bb 1 & 0 & -1 \\ -1 & 1 & 0 \\ 0 & -1 & 1 \eb \bb x_1 \\ x_2 \\ x_3 \eb = \bb x_1 - x_3 \\ x_2 - x_1 \\ x_3 - x_2 \eb = \b\)

The three equations either have infinitely many solutions (sometimes) or else no solutions (usually).

All linear combinations \(x_1\u + x_2\v + x_3\w^*\) lie on the plane given by \(b_1 + b_2 + b_3 = 0\).

Independence and Dependence

The key question is whether the third vector is in that plane:

Independence: \(\w\) is not in the plane of \(\u\) and \(\v\).

Dependence: \(\w\) is not in the plane of \(\u\) and \(\v\).

\(\u, \v, \w\) are independent. No combination except \(0\u + 0\v + 0\w = \bs{0}\) gives \(\b = \bs{0}\).

\(\u, \v, \w^*\) are dependent. Other combinations like \(\u + \v + \w^*\) give \(\b = \bs{0}\).

The vectors go into the columns of an \(n\) by \(n\) matrix:

Independent columns: \(A\x = \bs{0}\) has one solution. \(A\) is an invertible matrix.

Dependent columns: \(C\x = \bs{0}\) has many solutions. \(C\) is a singular matrix.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 2 Solving Linear Equations

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 2.1 Vectors and linear Equations

Note

Two euqtions, Two unknowns: \(\begin{matrix} x - 2y = 1 \\ 3x + 2y = 11 \end{matrix}\)

The point \((3,1)\) lies on both lines and solves both equations.

Note

ROWS: The row picture shows two lines meeting at a single point (the solution).

Separating the original system into its columns instead of its rows, we get a vector equations:

Combination equals \(\b\)

\[\begin{split}x \bb 1\\3 \eb + y \bb -2 \\2 \eb = \bb 1 \\ 11 \eb = \b.\end{split}\]

The problem is to find the combination of those vectors that equals the vector on the right. The right choices produce \(3(\bs{col1})+1(\bs{col2})=\b\).

Note

COLUMNS: The column picture combines the column vectors on the left side to produce the vector \(\b\) on the right side.

The left side of the vector equation is a linear combination of the columns.

Note

Linear combination: \(3\bb 1\\3 \eb + \bb -2\\2 \eb = \bb 1\\11 \eb\).

The coefficient matrix on the lieft side of the equations is the 2 by 2 matrix \(A\):

\[\begin{split}A = \bb 1 & -2 \\ 3 & 2 \eb\end{split}\]

Note

Matrix equation \(A\x=\b\): \(\bb 1 & -2 \\ 3 & 2 \eb \bb x\\y \eb = \bb 1\\11 \eb\)

Note

Dot products with rows, Combination of columns:

  • \(A\x=\b\ \) is \(\ \bb 1 & -2 \\ 3 & 2 \eb \bb 3\\1 \eb = \bb 1\\11 \eb\)

Four steps to understanding elimination using matrices:

  1. Elimination goes from \(A\) to a triangular \(U\) by a sequence of matrix steps \(E_{ij}\).

  2. The triangular system is solved by back substitution: working bottom to top.

  3. In matrix language \(A\) is factored into \(LU =\) (lower triangular) (upper triangular).

  4. Elimination succeeds if \(A\) is invertible. (But it may need rwo exchanges.)

Three Equations in Three unknowns

The three unknowns are \(x,y,z\). We have three linear equations:

\[\begin{split}A\x = \b \quad \begin{matrix} x + 2y + 3z = 6 \\ 2x + 5y + 2z = 4 \\ 6x - 3y + z = 2 \end{matrix}\end{split}\]

Before solving the problem, we visualize it both ways:

ROW: The row picture shows three planes meeting at a single point.

COLUMN: The column picture combines three columns to produce \(\b = (6,4,2)\).

The column picture starts with the vector form of the equations \(A\x = \b\):

Combine columns:

\[\begin{split}x\bb 1\\2\\6 \eb + y\bb 2\\5\\-3 \eb + z\bb 3\\2\\1 \eb = \bb 6\\4\\2 \eb = \b.\end{split}\]

Correct combination \((x,y,z)=(\bs{0},\bs{0},\bs{2})\):

\[\begin{split}\bs{0}\bb 1\\2\\6 \eb + \bs{0}\bb 2\\5\\-3 \eb + \bs{2}\bb 3\\2\\1 \eb = \bb 6\\4\\2 \eb.\end{split}\]

The Matrix Form of the Equations

The “coefficient matrix” in \(A\x = \b\)

\[\begin{split}A = \bb 1&2&3 \\ 2&5&2 \\ 6&-3&1 \eb.\end{split}\]

Matrix equation \(A\x = \b\)

\[\begin{split}\bb 1&2&3 \\ 2&5&2 \\ 6&-3&1 \eb \bb x\\y\\z \eb = \bb 6\\4\\2 \eb.\end{split}\]

What does it mean to multiply \(\bs{A}\) times \(\x\) ?

  • Multiplication by rows: \(A\x\) comes from dot products, each row times the column \(\x\):

    \[\begin{split}A\x = \bb (\bs{row1})\cd\x \\ (\bs{row2})\cd\x \\ (\bs{row3})\cd\x \eb.\end{split}\]
  • Multiplication by columns: \(A\x\) is a combination of column vectors:

    \[A\x = x(\bs{col1}) + y(\bs{col2}) + z(\bs{col3}).\]

This book sees \(A\x\) as a combination of the columns of \(A\).

Identity matrix always yields the multiplication \(I\x = \x\).

\[\begin{split}I = \bb 1&0&0 \\ 0&1&0 \\ 0&0&1 \eb\end{split}\]

Matrix Notation

For convenience, we type \(A(i,j)\) instead of \(a_{ij}\). The entry \(a_{57} = A(5,7)\) would be in row 5, column 7.

Multiplication in MATLAB

Note

MATLAB Matrix multiplication: \(\b = A * \x\).

Row at a time

\[\b = [A(1,:)*\x; A(2,:)*\x; A(3,:)*\x]\]

Column at a time

\[\b = A(:,1)*x(1) + A(:,2)*x(2) + A(:,3)*x(3)\]

Programming Languages for Mathematics and Statistics

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 2.2 The idea of Elimination

For 2 by 2 equations, before elimination, \(x\) and \(y\) appear in both equations. After elimination, the first unknown \(x\) has disappeared from the second equation.

Before:

\[ \begin{align}\begin{aligned}x - 2y &= 1\\3x + 2y &= 11\end{aligned}\end{align} \]

After (multiply equation 1 by 3, subtract to eliminate \(3x\)):

\[ \begin{align}\begin{aligned}x - 2y &= 1\\8y &= 8\end{aligned}\end{align} \]

The goal of elimination: produces an upper triangular system. The system is solved from the bottom upwards. This process is called back substitution. It is used for upper triangular systems of any size, after elimination gives a triangle.

If the first equation is changed to \(4x - 8y = 4\), the multiplier changes to \(l = \frac{3}{4}\). To find the multiplier, divide the coefficient 3 to be eliminated by the pivot 4:

\[ \begin{align}\begin{aligned}\bs{4}x - 8y &= 4\\\bs{3}x + 2y &= 11\end{aligned}\end{align} \]

Multiply eqution 1 by 3/4 and subtract from equation 2:

\[ \begin{align}\begin{aligned}4x - 8y &= 4\\8y &= 8\end{aligned}\end{align} \]

Note

Pivot: first nonzero in the row that does the elimination.

Multiplier: (entry to eliminate) divided by (pivot).

To solve \(n\) equations we want \(n\) pivots. The pivots are on the diagonal of the triangle after elimination.

Breakdown of Elimination

  • Permanet failure with no solution.

  • Failure with infinitely many solutions.

  • Temporary failure (zero in pivot). A row exchange produces two pivots.

Failure: For \(n\) equations we do not get \(n\) pivots.

Elimination leads to an equation \(\bs{0} \neq \bs{0}\) (no solution) or \(\bs{0} = \bs{0}\) (many solutions)

Success comes with \(\bs{n}\) pivots. But we may have to exchange the \(\bs{n}\) equations.

Three Equations in Three Unknowns

\[ \begin{align}\begin{aligned}2x + 4y - 2z &= 2\\4x + 9y - 3z &= 8\\-2x - 3y + 7z &= 10\end{aligned}\end{align} \]

Step 1: Subtract 2 times equation 1 from equation2. This leaves \(y+z=4\).

Step 2: Subtract -1 times equation 1 from equation 3. This leaves \(y+5z=12\).

\(\x\) is eliminated: \(\begin{matrix} 1y+1z=4 \\ 1y+5z=12 \end{matrix}\)

Step 3: Subtract equation 2new from 3new. The multiplier is \(1/1=1\). Then \(4z=8\).

The original \(A\x = \b\) has been converted into an upper triangular \(U\x = \bs{c}\):

\[ \begin{align}\begin{aligned}2x + 4y - 2z &= 2\\1y + 1z &= 4\\4z &= 8\end{aligned}\end{align} \]

The solution is \((x,y,z) = (-1,2,2)\).

Elimination from \(A\) to \(U\)

For a 4 by 4 problem, or an \(n\) by \(n\) problem, elimination proceeds in the same way.

Column 1: Use the first equation to create zeros below the first pivot.

Column 2: Use the new equation 2 to create zeros below the secod pivot.

Column 3 to \(\bs{n}\): Keep going to find all \(n\) pivots and the upper triangular \(U\).

After column 2 we have \(\bb x&x&x&x \\ 0&x&x&x \\ 0&0&x&x \\ 0&0&x&x \eb\). We want \(\bb x&x&x&x \\ &x&x&x \\ &&x&x \\ &&&x \eb\).

The result of forward elimination is an upper triangular system. It is nonsigular if there is a full set of \(n\) pivots.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 2.3 Elimination Using Matrices

  1. To see how each step is a matrix multiplication.

  2. To assemble all those steps \(E_{ij}\) into one elimination matrix \(E\).

  3. To see how each \(E_{ij}\) is inverted by its inverse matrix \(E_{ij}^{-1}\).

  4. To assemble all those inverse \(E_{ij}^{-1}\) (in the right order) into \(L\).

Matrix times Vectors and \(A\boldsymbol{x} = \boldsymbol{b}\)

The 3 by 3 example in the previous section has the short form \(A\x=\b\):

\[ \begin{align}\begin{aligned}2x_1+4x_2-2x_3 &= 2\\4x_1+9x_2-3x_3 &= 8\\-2x_1-3x_2+7x_3 &= 10\end{aligned}\end{align} \]

is the same as

\[\begin{split}\bb 2&4&-2 \\ 4&9&-3 \\-2&-3&7 \eb \bb x_1\\x_2\\x_3 \eb = \bb 2\\8\\10 \eb.\end{split}\]

The unknown is \(\x = \bb x_1\\x_2\\x_3 \eb\) and the solution is \(\x = \bb -1\\2\\2 \eb\).

Column form

\[\begin{split}A\x = (-1) \bb 2\\4\\-2 \eb + 2 \bb 4\\9\\-3 \eb + 2 \bb -2\\-3\\7 \eb = \bb 2\\8\\10 \eb = \b.\end{split}\]

\(A\x\) is a combination of the columns of \(A\). We use the row form of matrix multiplication. Components of \(A\x\) are dot products with rows of \(A\).

The first component of \(A\x\) above is \((-1)(2) + (2)(4) + (2)(-2)\).

The \(i\)th component of \(A\x\) is \((row\ i) \cd \x = a_{i1}x_1 + a_{i2}x_2 + \cds + a_{in}x_n = \Sg_{j=1}^n a_{ij}x_j\).

General rule: \(A_{ij} = A(i,j)\) is in row i, column j.

The Matrix Form of One Elimination Step

2 times the first equation is subtracted from the second equation. On the right side, 2 times the frist component of \(\b\) is subtracted from the second component.

First step: \(\b = \bb 2\\8\\10 \eb\) changes to \(\b_{\mathrm{new}} = \bb 2\\4\\10 \eb\).

The same result \(\b_{\mathrm{new}} = E\b\) is achieved when we multiply an “elimination matrix” \(E\) times \(\b\). It subtracts \(2b_1\) from \(b_2\):

Note

The elimination matrix is \(E = \bb 1&0&0 \\ -2&1&0 \\ 0&0&1 \eb\).

Multiplication by \(E\) subtracts 2 times row 1 from row 2. Rows 1 and 3 stay the same:

\[\begin{split}\bb 1&0&0 \\ -2&1&0 \\ 0&0&1 \eb \bb 2\\8\\10 \eb = \bb 2\\4\\10 \eb \quad \bb 1&0&0\\-2&1&0\\0&0&1 \eb\bb b_1\\b_2\\b_3 \eb=\bb b_1\\b_2-2b_1\\b_3 \eb\end{split}\]

Note

The idendity matrix has 1’s on the diagonal and otherwise 0’s. Then \(i\b = \b\) for all \(\b\). The elementary matrix or elimination matrix \(E_{ij}\) has the extra nonzero entry \(-l\) in the \(i,j\) position. Then \(E_{ij}\) subtracts a multiple \(l\) of row \(j\) from row \(i\).

Matrix Multiplication

How do we mulatiply two matrices?

\[\begin{split}EA = \bb 1&0&0 \\ -2&1&0 \\ 0&0&1 \eb \bb 2&4&-2 \\ 4&9&-3 \\ -2&-3&7 \eb = \bb 2&4&-2 \\ 0&1&1 \\ -2&-3&7 \eb.\end{split}\]

Tip

The first was \(E\) times \(A\x\), the second is \(EA\) times \(\x\). They are the same.

Note

Associative law is true: \(A(BC) = (AB)C\)

Commutative law is false: Often \(AB \neq BA\)

Tip

If \(B\) has several columns \(b_1, b_2, b_3\), then the columns of \(EB\) are \(Eb_1, Eb_2, Eb_3\).

Note

Matrix multiplication: \(AB = A\bb \b_1&\b_2&\b_3 \eb = \bb A\b_1&A\b_2&A\b_3 \eb\).

The Matrix \(P_{ij}\) for a Row Exchange

Permutation matrix

\[\begin{split}P_{23} = \bb 1&0&0 \\ 0&0&1 \\ 0&1&0 \eb.\end{split}\]

Note

Row Exchange Matrix: \(P_{ij}\) is the identity matrix with rows \(i\) and \(j\) reversed. When this “permutation matrix\(P_{ij}\) multiplies a matrix, it exchanges rows \(i\) and \(j\).

The Augmented Matrix

Elimination does the same row operations to \(A\) and to \(\b\). We can include \(\b\) as an extra column and follow it through elimination. The matrix \(A\) is enlarged or “augmented” by the extra column \(\b\).

Note

Augmented matrix: \(\bb A&\b \eb = \bb 2&4&-2&2 \\ 4&9&-3&8 \\-2&-3&7&10 \eb\).

Elimination acts on whole rwos of this matrix.

\[\begin{split}\bb 1&0&0 \\ -2&1&0 \\ 0&0&1 \eb \bb 2&4&-2&2 \\ 4&9&-3&8 \\-2&-3&7&10 \eb = \bb 2&4&-2&2 \\ 0&1&1&4 \\-2&-3&7&10 \eb.\end{split}\]

Matrix multiplication works by rwos and at the same time by columns:

ROWS: Each row of \(E\) acts on \(\bb A&\b \eb\) to give a row of \(\bb EA&E\b \eb\).

COLUMNS: \(E\) acts on each column of \(\bb A&\b \eb\) to give a column of \(\bb EA&E\b \eb\).

\(A\) goes to \(E_{21}A\) which goes to \(E_{31}E_{21}A\). Finally \(E_{32}E_{31}E_{21}A\) is a triangular matrix.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 2.4 Rules for Matrix Operations

The entry in row \(i\) and column \(j\) is called \(a_{ij}\) or \(A(i,j)\).

Tip

To multiply \(AB\): If \(A\) has \(n\) columns, \(B\) must have \(n\) rows.

Every column of \(B\) is multiplied by \(A\).

Note

Fundamental Law of Matrix Multiplication: \(AB\) times \(C\) equals \(A\) times \(BC\).

The parentheses can move safely in \((AB)C=A(BC)\).

Suppose \(A\) is \(m\) by \(n\) and \(B\) is \(n\) by \(p\). The product \(AB\) is \(m\) by \(p\).

\[(\bs{m} \times n)(n \times \bs{p}) = (\bs{m} \times \bs{p}).\]

A row times a column is 1 by \(n\) multiplies \(n\) by 1. The result will by 1 by 1. That single number is the “dot product”.

Note

1. The entry in row \(i\) and column \(j\) of \(AB\) is (row \(i\) of \(A\)) \(\cd\) (column \(j\) of \(B\)).

If \(A\) and \(B\) are \(n\) by \(n\), so is \(AB\). It contains \(n^2\) dot products, row of \(A\) times column of \(B\). Each dot product need \(n\) multiplications, so the computation of \(AB\) uses \(n^3\) separate multiplications.

The Second and Third Ways: Rows and Columns

Each column of \(AB\) is a combination of the columns of \(A\).

2. Matrix \(\bs{A}\) times every column of \(\bs{B}\):

\[A \bb \b_1 \cds \b_p \eb = \bb A\b_1 \cds A\b_p \eb.\]

Every row of \(AB\) is a combination of the rows of \(B\).

3. Every row of \(\bs{A}\) times matrix \(\bs{B}\):

\[\begin{split}\bb \bs{\mathrm{row}}\ i\ \bs{\mathrm{of}}\ A \eb \bb 1&2&3 \\ 4&5&6 \\ 7&8&9 \eb = \bb \bs{\mathrm{row}}\ i\ \bs{\mathrm{of}}\ AB \eb.\end{split}\]

\(AB = (m \times n)(n \times p) = (m \times p)\): \(mp\) dot products with \(n\) steps each.

The Fourth Way: Columns Multiply Rows

4. Multiply columns 1 to \(\bs{n}\) of \(\bs{A}\) times rows 1 to \(\bs{n}\) of \(\bs{B}\). Add those matrices.

\[ \begin{align}\begin{aligned}\begin{split}\bb \bs{\mathrm{col\ 1}}&\bs{\mathrm{col\ 2}}&\bs{\mathrm{col\ 3}}\\ \cd&\cd&\cd \\ \cd&\cd&\cd \eb \bb \bs{\mathrm{row\ 1}}&\cd&\cd \\ \bs{\mathrm{row\ 2}}&\cd&\cd \\ \bs{\mathrm{row\ 3}}&\cd&\cd \eb\end{split}\\= (\bs{\mathrm{col\ 1}})(\bs{\mathrm{row\ 1}}) + (\bs{\mathrm{col\ 2}})(\bs{\mathrm{row\ 2}}) + (\bs{\mathrm{col\ 3}})(\bs{\mathrm{row\ 3}}).\end{aligned}\end{align} \]

For 2 by 2 matrices:

\[\begin{split}AB = \bb a&b\\c&d \eb \bb E&F\\G&H \eb = \bb aE+bG&aF+bH \\ cE+dG&cF+dH \eb\end{split}\]

Note

Add columns of \(A\) times rows of \(B\):\(AB = \bb a\\c \eb \bb E&F \eb + \bb b\\d \eb \bb G&H \eb\)

This uses the same \(mnp\) steps as in the dot products but in a new order.

The Laws for Matrix Operations

Laws that matrix operations obey:

  • Commutative law: \(A+b=B+A\)

  • Distributive law: \(c(A+B) = cA+cB\)

  • Associative law: \(A+(B+C) = (A+B)+C\)

  • Distributive law from the left: \(A(B+C) = AB+AC\)

  • Distributive law from the right: \((A+B)C = AC + BC\)

  • Associative law for \(ABC\) (parentheses not needed): \(A(BC) = (AB)C\)

The commutative “law” is usually broken: \(AB \neq BA\).

Note

\(A^p = AAA\cds A\) (\(p\) factors): \((A^p)(A^q) = A^{p+q} \quad (A^p)^q = A^{pq}\)

Block Matrices and Block Multiplication

4 by 6 matrix, 2 by 2 blocks give 2 by 3 block matrix:

\[\begin{split}A = \left[\begin{array}{cc|cc|cc} 1&0&1&0&1&0 \\ 0&1&0&1&0&1 \\ \hline 1&0&1&0&1&0 \\ 0&1&0&1&0&1 \end{array}\right] = \bb I&I&I \\ I&I&I \eb\end{split}\]

Note

Block multiplication: If blocks of \(A\) can multiply blocks of \(B\), then block multiplication of \(AB\) is allowed. Cuts between columns of \(A\) match cuts between rows of \(B\).

  • \(\bb A_{11}&A_{12} \\ A_{21}&A_{22} \eb \bb B_{11}\\B_{21} \eb = \bb A_{11}B_{11}+A_{12}B_{21} \\ A_{21}B_{11}+A_{22}B_{21} \eb\)

Important special case: Let the blocks of \(A\) be its \(n\) columns. Let the blocks of \(B\) be its \(n\) rows. Then block multiplication \(AB\) adds up columns times rows:

\[\begin{split}\bb |&&| \\ a_1&\cds &a_n \\ |&&| \eb \bb - &b_1& - \\ & \vdots & \\ - &b_n& - \eb = \bb a_1b_1 + \cds + a_nb_n \eb.\end{split}\]

For example,

\[\begin{split}\bb 1&4\\1&5 \eb\bb 3&2\\1&0 \eb =\bb 1\\1 \eb\bb 3&2 \eb+\bb 4\\5 \eb\bb 1&0 \eb =\bb 3&2\\3&2 \eb+\bb 4&0\\5&0 \eb=\bb 7&2\\8&2 \eb.\end{split}\]

the usual way, rows times columns, gives four dot products (8 multiplications). The new way, columns times rows, gives two full matrices (the same 8 multiplications).

Elimination by blocks: Suppose a matrix has four blocks \(A, B, C, D\):

\[\begin{split}\left[\begin{array}{c|c} I & 0 \\ \hline -CA^{-1} & I \end{array}\right] \left[\begin{array}{c|c} A & B \\ \hline C & D \end{array}\right] \left[\begin{array}{c|c} A & B \\ \hline 0 & D-CA^{-1}B \end{array}\right]\end{split}\]

The final block is \(D-CA^{-1}B\), just like \(d-cb/a\). This is called the Schur complement.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 2.5 Inverse Matrices

Suppose \(A\) is a square matrix. We look for an “inverse matrix\(A^{-1}\) of the same size, such that \(A^{-1}\) times \(A\) equals \(I\). But \(A^{-1}\) might not exist.

Note

DEFINITION: The matrix \(A\) is invertible if there exists a matrix \(A^{-1}\) that “inverts” \(A\):

  • Two-sided inverse: \(A^{-1}A = I\) and \(AA^{-1} = I\).

Not all matrices have inverses:

  • Note 1: The inverse exists if and only if elimination produces \(n\) pivots (row exchanges are allowed).

  • Note 2: The matrix \(A\) cannot have two different inverses. Suppose \(BA=I\) and also \(AC=I\). Then \(B =C\):

    \[B(AC) = (BA)C\ \mathrm{gives}\ BI=IC\ \mathrm{or}\ B=C.\]

    This shows that a left-inverse \(B\) and a right-inverse \(C\) must be the same matrix.

  • Note 3: If \(A\) is invertible, the one and only solution to \(A\x=\b\) is \(\x=A^{-1}\b\):

Note

Multiply \(A\x=\b\) by \(A^{-1}\). Then \(\x = A^{-1}A\x = A^{-1}\b\).

  • Note 4: Suppose there is a nonzero vector \(\x\) such that \(A\x = \bs{0}\). Then \(A\) connot have an inverse. no matrix can bring \(\bs{0}\) back to \(\x\).

Tip

If \(A\) is invertible, then \(A\x=\bs{0}\) can only have the zero solution \(\x=A^{-1}\bs{0}=\bs{0}\).

  • Note 5: A 2 by 2 matrix is invertible if and only if \(ad-bc\) is not zero:

    2 by 2 Inverse:

    \[\begin{split}\bb a&d\\c&d \eb^{-1} = \frac{1}{ad-bc} \bb d&-b\\-c&a \eb.\end{split}\]

    The number \(ad-bc\) is the determinant of \(A\). A matrix is invertible if its determinant is not zero.

  • Note 6: A diagonal matrix has an inverse provided no diagonal entries are zero:

    \[\begin{split}\mathrm{If}\ A = \bb d_1 \\ & \ddots \\ && d_n \eb \ \mathrm{then}\ A^{-1} = \bb 1/d_1 \\ & \ddots \\ && 1/d_n \eb.\end{split}\]

The Inverse of a Product \(AB\)

The product \(AB\) has an inverse, if and only if the two factors \(A\) and \(B\) are separately invertible (and the same size).

Note

If \(A\) and \(B\) are invertible then so is \(AB\). The inverse of a product \(AB\) is

  • \((AB)^{-1}=B^{-1}A^{-1}\).

Inverse of AB:

\[(AB)^{-1}(B^{-1}A^{-1}) = AIA^{-1} = AA^{-1} = I.\]

Reverse order:

\[(ABC)^{-1} = C^{-1}B^{-1}A^{-1}.\]

Inverse of an elimination matrix:

\[\begin{split}E = \bb 1&0&0\\-5&1&0\\0&0&1 \eb\ \mathrm{and}\ E^{-1} = \bb 1&0&0\\5&1&0\\0&0&1 \eb.\end{split}\]

For square matrices, an inverse on one side is automatically an inverse on the other side.

\[ \begin{align}\begin{aligned}\begin{split}F = \bb 1&0&0\\0&1&0\\0&-4&1 \eb\ \mathrm{and}\ F^{-1} = \bb 1&0&0\\0&1&0\\0&4&1 \eb.\end{split}\\\begin{split}FE = \bb 1&0&0\\-5&1&0\\20&-4&1 \eb\ \mathrm{is\ inverted\ by}\ E^{-1}F^{-1} = \bb 1&0&0\\5&1&0\\0&4&1 \eb.\end{split}\end{aligned}\end{align} \]
  • In this order \(FE\), row 3 feels an effect from row 1.

  • In this order \(E^{-1}F^{-1}\), row 3 feels no effect from row 1.

Note

In elimination order \(F\) follows \(E\). In reverse order \(E^{-1}\) follows \(F^{-1}\). \(\bs{E^{-1}F^{-1}}\) is quick. The multipliers 5, 4 fall into place below the diagonal of 1’s.

Calculating \(A^{-1}\) by Gauss-Jordan Elimination

Each of the columns \(\bs{x_1}, \bs{x_2}, \bs{x_3}\) of \(A^{-1}\) is multiplied by \(A\) to produce a column of \(I\):

Note

3 columns of \(A^{-1}\): \(AA^{-1}=A\bb \bs{x_1}&\bs{x_2}&\bs{x_3} \eb = \bb \bs{e_1}&\bs{e_2}&\bs{e_3} \eb = I\).

The Gauss-Jordan method computes \(A^{-1}\) by solving all \(n\) equations together.

Start Gauss-Jordan on \(K\)

\[\begin{split}\bb K&\bs{e_1}&\bs{e_2}&\bs{e_3} \eb = \bb 2&-1&0&1&0&0 \\ -1&2&-1&0&1&0 \\ 0&-1&2&0&0&1 \eb\end{split}\]

(\(\frac{1}{2}\)row 1 + row 2)

\[\begin{split}\rightarrow \bb 2&-1&0&1&0&0 \\ 0&\frac{3}{2}&-1&\frac{1}{2}&1&0 \\ 0&-1&2&0&0&1 \eb\end{split}\]

(\(\frac{2}{3}\)row 2 + row 3)

\[\begin{split}\rightarrow \bb 2&-1&0&1&0&0 \\ 0&\frac{3}{2}&-1&\frac{1}{2}&1&0 \\ 0&0&\frac{4}{3}&\frac{1}{3}&\frac{2}{3}&1 \eb\end{split}\]

The matrix in the first three columns is \(U\) (upper triangular). Jordan goes all the way to the reduced echelon form \(\bs{R=I}\). Rows are added to rows above them, to produce zeros above the pivots

(Zero above third pivot) (\(\frac{3}{4}\)row 3 + row 2)

\[\begin{split}\rightarrow \bb 2&-1&0&1&0&0 \\ 0&\frac{3}{2}&0&\frac{3}{4}&\frac{3}{2}&\frac{3}{4} \\ 0&0&\frac{4}{3}&\frac{1}{3}&\frac{2}{3}&1 \eb\end{split}\]

(Zero above second pivot) (\(\frac{2}{3}\)row 2 + row 1)

\[\begin{split}\rightarrow \bb 2&0&0&\frac{3}{2}&1&\frac{1}{2} \\ 0&\frac{3}{2}&0&\frac{3}{4}&\frac{3}{2}&\frac{3}{4} \\ 0&0&\frac{4}{3}&\frac{1}{3}&\frac{2}{3}&1 \eb\end{split}\]

The three columns of \(K^{-1}\) are in the second half of \(\bb I & K^{-1} \eb\):

\[\begin{split}\begin{matrix} (\mathrm{divide\ by\ }2) \\ (\mathrm{divide\ by\ }\frac{3}{2}) \\ (\mathrm{divide\ by\ }\frac{4}{3}) \end{matrix}\quad \bb 1&0&0&\frac{3}{4}&\frac{1}{2}&\frac{1}{4} \\ 0&1&0&\frac{1}{2}&1&\frac{1}{2} \\ 0&0&1&\frac{1}{4}&\frac{1}{2}&\frac{3}{4} \eb = \bb I&\bs{x_1}&\bs{x_2}&\bs{x_3} \eb = \bb I & K^{-1} \eb.\end{split}\]

Note

Gauss-Jordan: Multiply \(\bb \bs{A}&\bs{I} \eb\) by \(\bs{A^{-1}}\) to get \(\bb \bs{I}&\bs{A^{-1}} \eb\).

Observations about \(K^{-1}\):

  1. \(K\) is symmetric across its main diagonal. Then \(K^{-1}\) is also symmetric.

  2. \(K\) is tridiagonal (only three nonzero diagonals). But \(K^{-1}\) is a dense matrix with no zeros. The inverse of a band matrix is generally a dense matrix.

  3. The product of pivots is \(2(\frac{3}{2})(\frac{4}{3})=4\). This number 4 is the determinant of \(K\).

\(\bs{K^{-1}}\) involves division by the determinant of \(\bs{K}\):

\[\begin{split}K^{-1} = \frac{1}{4} \bb 3&2&1 \\ 2&4&2 \\ 1&2&3 \eb.\end{split}\]

Tip

This is why an invertible matrix cannot have a zero determinant: we need to divide.

If \(A\) is invertible and upper triangular, so is \(A^{-1}\).

The total cost for \(A^{-1}\) using Gauss-Jordan elimination is \(n^3\) multiplications and subtractions.

Tip

To solve \(Ax=b\) without \(A^{-1}\), we deal with one column \(b\) to find one column \(x\).

Singular versus Invertible

\(\bs{A^{-1}}\) exists exactly when \(\bs{A}\) has a full set of \(\bs{n}\) pivots. (Row exhange are allowed) Prove by Gauss-Jordan elimination:

  1. With \(n\) pivots, elimination solves all the equations \(A\x_i=\bs{e}_i\). The columns \(\x_i\) go into \(A^{-1}\). Then \(AA^{-1}=I\) and \(A^{-1}\) is at least a right-inverse.

  2. Elimination is really a sequence of multiplication by \(E\)’s and \(P\)’s and \(D^{-1}\):

Left-inverse \(C\):

\[CA = (D^{-1}\cds E \cds P \cds E)A = I.\]

\(D^{-1}\) divides by the pivots. The matrices \(E\) produce zeros below and above the pivots. \(P\) exchanges rows if needed.

\(A\) must have \(n\) pivots if \(AC=I\):

  1. If \(A\) doesn’t have \(n\) pivots, elimination will lead to a zero row.

  2. Those elimination steps are taken by an invertible \(M\). So a row of \(MA\) is zero.

  3. If \(AC=I\) had been possible, then \(MAC=M\). The zero row of \(MA\), times \(C\), gives a zero row of \(M\) itself.

  4. An invertible matrix \(M\) can’t have a zero row! \(A\) must have \(n\) pivots if \(AC=I\).

Note

Elimination gives a complete test for invertibility of a square matrix. \(\bs{A^{-1}}\) exists when \(A\) has \(n\) pivots. The argument above shows more:

  • If \(AC=I\) then \(CA=I\) and \(C=A^{-1}\).

Tip

A triangular matrix is invertible if and only if no diagonal entries are zero.

Recognizing an Invertible Matrix

Diagonally dominant matrices are invertible. Each \(a_{ii}\) on the diagonal is larger than the total sum along the rest of row \(i\). On every row,

\[|a_{ii}| > \sum_{j \neq i}|a_{ij}|\ \mathrm{means\ that}\ |a_{ii}| > |a_{i1}| + \cds (\mathrm{skip}\ |a_{ii}|) \cds + |a_{in}|.\]

Take any nonzero vector \(\x\). Suppose its largest component is \(|x_i|\). Then \(A\x=\bs{0}\) is impossible, because row \(i\) of \(A\x=\bs{0}\) would need

\[a_{i1}x_1 + \cds + a_{ii}x_i + \cds + a_{in}x_n = 0.\]

Those can’t add to zero when \(A\) is diagonally dominant. The size of \(a_{ii}x_i\) is greater than all the other terms combined:

\[\mathrm{All}\ |x_j|\leq|x_i|\quad\sum_{j\neq i}|a_{ij}x_j|\leq \sum_{j\neq i}|a_{ij}||x_i|<|a_{ii}||x_i|\quad \mathrm{because}\ a_{ii}\ \mathrm{dominates}.\]
\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 2.6 Elimination = Factorization: \(A = LU\)

The factors \(L\) and \(U\) are triangular matrices. The factorization that comes from elimination is \(\bs{A=LU}\).

\(U\) is the upper triangular matrix with the pivots on its diagonal from elimination.

\(L\) is the lower triangular matrix whose entries are exactly the multipliers \(l_{ij}\)–which multiplied the pivot row \(j\) when it was subtracted from row \(i\).

Forward from \(A\) to \(U\):

\[\begin{split}E_{21}A = \bb 1&0\\-3&1 \eb \bb 2&1\\6&8 \eb = \bb 2&1\\0&5 \eb = U.\end{split}\]

Back from \(U\) to \(A\):

\[\begin{split}E_{21}^{-1}A = \bb 1&0\\3&1 \eb \bb 2&1\\0&5 \eb = \bb 2&1\\6&8 \eb = A = LU.\end{split}\]

For larger matrices with many \(E\)’s, \(L\) will include all their inverses.

Note

\((E_{32}E_{31}E_{21})A=U\) becomes \(A=(E_{21}^{-1}E_{31}^{-1}E_{32}^{-1})U\) which is \(A=LU\).

Explanation and Examples

First point: Every inverse matrix \(E^{-1}\) is lower triangular. Its off-diagonal entry is \(l_{ij}\), to undo the subtraction produced by \(-l_{ij}\). The main diagonals of \(E\) and \(E^{-1}\) contain 1’s.

Second point: This lower triangular product of inverses is \(\bs{L}\).

Third point: Each multiplier \(l_{ij}\) goes directly into its \(i,j\) position unchanged in the product of inverses which is \(L\). \(L\) also has 1’s down its diagonal.

Note

\(\bs{A=LU}\): This is elimination without row exchanges. The upper triangular \(U\) has the pivots on its diagonal. The lower triangular \(L\) has all 1’s on its diagonal. The multipliers \(l_{ij}\) are below the diagonal of \(L\).

Tip

  • When a row of \(A\) starts with zeros, so does that row of \(L\).

  • When a column of \(A\) starts with zeros, so does that column of \(U\).

The key reason why \(A\) equals \(LU\):

\[\mathrm{Row\ 3\ of\ }U=(\mathrm{Row\ 3\ of\ } A)-l_{31}(\mathrm{Row\ 1\ of\ } U)-l_{32}(\mathrm{Row\ 2\ of\ } U)\]

Rewrite this equation to see that the row \(\bb l_{31} & l_{32} 1 \eb\) is multiplying the matrix \(U\):

Note

\(\mathrm{Row\ 3\ of\ }A=1(\mathrm{Row\ 3\ of\ } U)+l_{31}(\mathrm{Row\ 1\ of\ } U)+l_{32}(\mathrm{Row\ 2\ of\ } U)\)

Better balance from LDU: Divide \(U\) by a diagonal matrix \(D\) that contains the pivots. That leaves a new triangular matrix with 1’s on the diagonal:

\[\begin{split}\mathrm{Split\ } U \mathrm{\ into\ } \bb d_1\\&d2\\&&\ddots\\&&&d_n \eb \bb 1&u_{12}/d_1&u_{13}/d_1&\cd\\&1&u_{23}/d_2&\cd\\&&\ddots&\vdots\\&&&1 \eb.\end{split}\]

Note

The triangular factorization can be written \(\bs{A=LU}\) or \(\bs{A=LDU}\).

One Square System = Two Triangular Systems

Note

1 Factor (into \(L\) and \(U\), by elimination on the left side matrix \(A\)).

2 Solve (forward elimination on \(\b\) using \(L\), then back substitution for \(\x\) using \(U\).

Note

Forward and backward: Solve \(L\bs{c} = \b\) and then solve \(U\x = \bs{c}\).

\(LU\x = L\bs{c}\) is just \(A\x=\b\).

The Cost of Elimination

Note

Elimination on \(A\) requires about \(\frac{1}{3}n^3\) multiplications and \(\frac{1}{3}n^3\) subtractions.

Note

Solve: Each right side needs \(n^2\) multiplications and \(n^2\) subtractions.

A band matrix \(B\) has only \(w\) nonzero diagonals below and above its main diagonal. The zero entries outside the band stay zero in elimination (they are zero in \(L\) and \(U\)).

Note

Band matrix: \(\bs{A}\) to \(\bs{U}\): \(\frac{1}{3}n^3\) reduces to \(nw^2\quad\) Solve: \(n^2\) reduces to \(2nw\).

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 2.7 Transposes and Permutations

The transpose of \(A\) is denoted by \(A^T\). The columns of \(A^T\) are the rows of \(A\). When \(A\) is an \(m\) by \(n\) matrix, the transpose is \(n\) by \(m\):

\[\begin{split}\mathrm{If\ }A=\bb 1&2&3\\0&0&4\eb\mathrm{\ then\ }A^T=\bb 1&0\\2&0\\3&4\eb.\end{split}\]

Note

Exchange rows and columns: \((A^T)_{ij}=A_{ji}\).

The rules for transposes:

  • Sum: \((A+B)^T = A^T+B^T\)

  • Product: \((AB)^T = B^TA^T\)

  • Inverse: \((A^{-1})^T = (A^T)^{-1}\).

Tip

\(A\x\) combines the columns of \(A\) while \(\x^TA^T\) combines the rows of \(A^T\).

Transposing \(AB=\bb A\x_1&A\x_2&\cds\eb\) gives \(\bb\x_1^TA^T\\\x_2^TA^T\\\vdots\eb\) which is \(B^TA^T\).

The reverse order rule extends to three or more factors: \((ABC)^T=C^TB^TA^T\).

  • If \(A=LDU\) then \(A^T=U^TD^TL^T\). The pivot matrix has \(D=D^T\).

Transpose of inverse:

\[A\im A = I\ \mathrm{is\ transposed\ to\ } A^T(A\im)^T=I.\]

\(A^T\) is invertible exactly when \(A\) is invertible.

The Meaning of Inner Products

The dot product \(\x\cd\bs{y}\) is the sum of numbers \(x_iy_i\). Use matrix notation instead:

  • \(^T\) is inside: The dot product or inner product is \(x^Ty\quad(1\times n)(n\times 1)\).

  • \(^T\) is outside: The rank one product or outer product is \(xy^T\quad(n\times 1)(1\times n)\).

\(\x^T\y\) is a number, \(\x\y^T\) is a matrix.

Examples where the inner product has meaning:

  • From mechanics: Work = (Movements) (Forces) = \(\x^T\bs{f}\)

  • From circuits: Heat loss = (Voltage drops) (Currents) = \(e^T\y\)

  • FRom economics: Income = (Quantities) (Prices) = \(\bs{q}^T\bs{p}\)

\(A^T\) is the matrix that makes these two inner products equal for every \(\x\) and \(\y\):

Tip

\((A\x)^T\y = \x^T(A^T\y)\) Inner product of \(A\x\) with \(\y =\) Inner product of \(\x\) with \(A^T\y\).

Symmetric Matrices

Note

DEFINITION: A symmetric matrix has \(S^T=S\). This means that \(s_{ji}=s_{ij}\).

The inverse of a symmetric matrix is also symmetric. The transpose of \(S\im\) is \((S\im)^T=(S^T)\im=S\im\). That says \(S\im\) is symmetric (when \(S\) is invertible).

Symmetric Products \(A^TA\) and \(AA^T\) and \(LDL^T\)

The product \(S=A^TA\) is automatically a square symmetric matrx:

Tip

The transpose of \(A^TA\) is \(A^T(A^T)^T\) which is \(A^TA\) again.

The \((i,j)\) entry of \(A^TA\) is the dot product of row \(i\) of \(A^T\) (column \(i\) of \(A\)) with column \(j\) of \(A\). The \((j,i)\) entry is the same dot product, column \(j\) with column \(i\). So \(A^TA\) is symmetric.

The product \(A^TA\) is \(n\) by \(n\). \(AA^T\) is \(m\) by \(m\). Both are sysmetric, with positive diagonal. But even if \(m=n\), it is very likely that \(A^TA\neq AA^T\).

Symmetric matrices in elimination: the symmetry is in the triple product \(S=LDU\).

\[ \begin{align}\begin{aligned}\begin{split}\bb 1&2\\2&7 \eb = \bb 1&0\\2&1 \eb \bb 1&2\\0&3 \eb\end{split}\\\begin{split}\bb 1&2\\2&7 \eb = \bb 1&0\\2&1 \eb \bb 1&0\\0&3 \eb \bb 1&2\\0&1 \eb\end{split}\end{aligned}\end{align} \]

Note

If \(S=S^T\) is factored into \(LDU\) with no rwo exchanges, then \(U\) is exactly \(L^T\).

Tip

The symmetric factorization of a symmetric matrix is \(S=LDL^T\).

Permutation Matrices

The row exchanges \(P_{ij}\) are constructed by exchanging two rows \(i\) and \(j\) of \(I\).

Note

DEFINITION: A permutation matrix \(P\) has the rows of the identity \(I\) in any order.

There are \(n!\) permutation matrices of order \(n\). \(P\im\) is also a permutation matrix. \(P\im\) is always the same as \(P^T\).

The \(PA=LU\) Factorization with Row Exchanges

Sometimes row exchanges are needed to produce pivots. Then \(A=(E\im \cds P\im \cds E\im \cds P\im \cds)U\). Every row exchange is carried out by a \(P_{ij}\) and inverted by that \(P_{ij}\). We now compress those row exchanges into a single permutation matrix \(P\).

  1. The row exchanges can be done in advance. Their product \(P\) puts the rows of \(A\) in the right order, so that no exchanges are needed for \(PA\). Then \(\bs{PA=LU}\).

  2. If we hold row exchanges until after elimination, the pivot rows are in a strange order. \(P_1\) puts them in the correct triangular order in \(U_1\). Then \(\bs{A=L_1P_1U_1}\).

Note

If \(A\) is invertible, a permutation \(P\) will put its rows inthe right order to factor \(PA=LU\). There must be a full set of pivots after row exchanges for \(A\) to be invertible.

The Transpose of a Derivative

The matrix changes to a derivative so \(\bs{A=d/dt}\). The inner product changes from the sum of \(x_ky_k\) to the integral of \(x(t)y(t)\).

Note

Inner product of functions: \(\dp x^Ty=(x,y)=\int^{\infty}_{-\infty}x(t)y(t)\ dt\).

The word “adjoint” is more correct than “transpose” when we are working with derivatives.

The transpose of a matrix has \((A\x)^T\y=\x^T(A^T\y)\). The adjoint of \(A=\frac{d}{dt}\) has

\[(Ax,y) = \int^{\infty}_{-\infty} \frac{dx}{dt}y(t)dt = \int^{\infty}_{-\infty} x(t)\left( -\frac{dy}{dt} \right)dt = (x,A^Ty)\]

The dirivative moves from the first function \(x(t)\) to the second function \(y(t)\). During that move, a minus sign appears. The transpose of the derivative is minus the derivative.

The derivative is antisymmetric: \(\bs{A=d/dt}\) and \(\bs{A^T=-d/dt}\). Symmetric matrices have \(S^T=S\), antisymmetric matrices have \(A^T=-A\).

This antisymmetry of the derivative applies also to centered difference matrices.

\[\begin{split}A = \bb 0&1&0&0\\-1&0&1&0\\0&-1&0&1\\0&0&-1&0 \eb\mathrm{\ transposes\ to\ } A^T=\bb 0&-1&0&0\\1&0&-1&0\\0&1&0&-1\\0&0&1&0 \eb = -A.\end{split}\]

A forward difference matrix transposes to a backward difference matrix, multiplied by -1. In differential equations, the second derivative (acceleration) is symmetric. The first derivative (damping proportional to velocity) is antisymmetric.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 3 Vector Spaces and Subspaces

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 3.1 Spaces of Vectors

The vector spaces are denoted by \(\R^1, \R^2, \R^3, \cds\). Each space \(\R^n\) consists of a whole collection of vectors.

Note

DEFINITION: The space \(\R^n\) consists of all column vectors \(\v\) with \(n\) components.

The components of \(\v\) are real numbers, which is the reason for the letter \(\R\). A vector whose \(n\) components are complex numebrs lies in the space \(\bs{\rm{C}}^n\).

The vector space \(\R^2\) is represented by the usual \(xy\) plane. Each vector \(\v\) in \(\R^2\) has two components that give the \(x\) and \(y\) coordinates of a point in the plane \(\v=(x,y)\).

The vectors in \(\R^3\) correspond to points \((x,y,z)\) in three-dimensional space. The one-dimensional space \(\R^1\) is a line (like the \(x\) axis).

The two essential vector operations go on inside the vector space, and they produce linear combinations:

Tip

We can add any vectors in \(\R^n\), and we can multiply any vector \(\v\) by any scalar \(c\).

“Inside the vector space” means that the result stays in the space.

A real vector space is a set of “vectors” together with rules for vector addition and for multiplication by real numbers.

Note

  • \(\bs{\rm{M}}\): The vector space of all real 2 by 2 matrices.

  • \(\bs{\rm{F}}\): The vector space of all real functions \(f(x)\).

  • \(\bs{\rm{Z}}\): The vector space that consists only of a zero vector.

The function space \(\bs{\rm{F}}\) is infinite-dimensional. A smaller function space is \(\bs{\rm{P}}\), or \(\bs{\rm{P}}_n\), containing all polynomials \(a_0 + a_1x + \cds + a_nx^n\) of degree \(n\).

The spzce \(\bs{\rm{Z}}\) is zero-dimensional. \(\bs{\rm{Z}}\) is the smallest possible vector space. The vector space \(\bs{\rm{Z}}\) contains exactly one vector (zero). Each space has its own zero vector.

Subspaces

Note

DEFINITION: A subspace of a vector space is a set of vectors (including \(\bs{0}\)) that satisfies two requirements: If \(\v\) and \(\w\) are in the subspace and \(c\) is any scalar, then

  1. \(\v+\w\) is in the subspace;

  2. \(c\v\) is in the subspace.

In short, all linear combinations stay in the subspace.

Every subspace contains the zero vector. Planes that don’t contain the origin fail those tests. Those planes are not subspaces.

Lines through the origin are also subspaces. Another subspace is all of \(\R^3\). The whole space is a subspace (of itself).

A list of all the possible subspace of \(\R^3\):

  1. \(\bs{\rm{L}}\): Any line through \((0,0,0)\)

  2. \(\bs{\rm{P}}\): Any plane through \((0,0,0)\)

  3. \(\R^3\): The whole space

  4. \(\bs{\rm{Z}}\): The single vector \((0,0,0)\)

The quater-plane is not a subspace. Two quarter-planes don’t make a subspace.

Note

A subspace containing \(\v\) and \(\w\) must contain all linear combinations \(c\v+d\w\).

The Column Space of \(A\)

Start with the columns of \(A\) and take all their linear combinations. This produces the column space of \(A\). It is a vector space made up of column vectors.

Note

DEFINITION: The column space consists of all linear combinations of the columns The combinations are all possible vectors \(A\x\). They fill the column space \(\bs{C}(A)\).

To solve \(A\x=\b\) is to express \(\b\) as a combination of the columns.

Note

The system \(A\x=\b\) is solvable if and only if \(\b\) is in the column space of \(A\).

Suppose \(A\) by \(n\) matrix. Its columns have \(m\) components. So the columns belong to \(\R^m\). The column space of \(A\) is a subspace of \(\R^m\).

Notation: The column space of \(A\) is denoted by \(\bs{C}(A)\). Start with the columns and take all their linear combinations. We might get the whole \(\R^m\) or only a subspace.

Important: Instead of columns in \(\R^m\), we could start with any set \(\bs{\rm{S}}\) of vectors in a vector space \(\bs{\rm{V}}\). To get a subspace* \(\bs{\rm{SS}}\) of \(\bs{\rm{V}}\), we take all combinations of the vectors in that set:

  • \(\bs{\rm{S}}\) = set of vectors in \(\bs{\rm{V}}\) (probably not a subspace)

  • \(\bs{\rm{SS}}\) = all combinations of vectors in \(\bs{\rm{S}}\) (definitely a subspace)

Note

\(\bs{\rm{SS}}\) = all \(c_1\v_1 + \cds + c_N\v_N\) = the subspace of \(\bs{\rm{V}}\) “spanned” by \(\bs{\rm{S}}\)

When \(\bs{\rm{S}}\) is the set of columns, \(\bs{\rm{SS}}\) is the column space. When there is only one nonzero vector \(\v\) in \(\bs{\rm{S}}\), the subsapce \(\bs{\rm{SS}}\) is the line through \(\v\). Always \(\bs{\rm{SS}}\) is the smallest subspace containing \(\bs{\rm{S}}\).

The columns “span” the column space.

Tip

The subspace \(\bs{\rm{SS}}\) is the “span” of \(\bs{\rm{S}}\), containing all combinations of vectors in \(\bs{\rm{S}}\).

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 3.2 The Nullspace of \(A\): Solving \(A\boldsymbol{x} = 0\) and \(R\boldsymbol{x}=0\)

For a \(m\) by \(n\) matrix, one immediate solution to \(A\x=\bs{0}\) is \(\x=\bs{0}\). For invertible matrices this is the only solution. For other non-invertible matrices, there are nonzero solutions to \(A\x=\bs{0}\). Each solution \(\x\) belongs to the nullspace of \(A\).

Note

The nullspace \(N(A)\) consists of all solutions to \(A\x=\bs{0}\). These vectors \(\x\) are in \(\R^n\).

The solution vectors \(\x\) have \(n\) components. They are vectors in \(\R^n\), so the nullspace is a subspace of \(\R^n\). The column space \(\bs{C}(A)\) is a subspace of \(\R^m\).

Note

Special solution \(A\bs{s}=\bs{0}\): The nullspace of \(A=\bb 1&2\\3&6 \eb\) contains all multiples of \(\bs{s} = \bb -2\\1 \eb\).

This is the best way to describe the nullspace, by computing special solutions to \(A\x=\bs{0}\). The solution is special because we set the free variable to \(\bs{x_2=1}\).

Tip

The nullspace of \(A\) consists of all combinations of the special solutions to \(A\x=\bs{0}\).

\[\begin{split}\bb 1&2&3 \eb\bb x\\y\\z \eb = 0 \mathrm{\ has\ two\ special\ solutions\ } \bs{s}_1 = \bb -2\\1\\0 \eb \mathrm{\ and\ } \bs{s}_2 = \bb -3\\0\\1 \eb.\end{split}\]

The last two components of \(\bs{s}_1\) and \(\bs{s}_2\) are “free” and we choose them specially as 1,0 and 0,1. Then the first components -2 and -3 are determined by the equation \(A\x=\bs{0}\).

Pivot Columns and Free columns

The first column of \(A=\bb 1&2&3 \eb\) contains the only pivot, so the first component of \(\x\) is not free. The free components correspond to columns with no pivots. The special choice (one or zero) is only for the free variables in the special solutions.

\[\begin{split}A = \bb 1&2\\3&8 \eb \quad B = \bb A\\2A \eb = \bb 1&2\\3&8\\2&4\\6&16 \eb \quad C = \bb A&2A \eb = \bb 1&2&2&4\\3&8&6&16 \eb.\end{split}\]

The equation \(A\x=\bs{0}\) has only the zero solution \(\x=\bs{0}\). The nullspace is \(\bs{Z}\). It contains only the single point \(\x=\bs{0}\) in \(\R^2\).

\[\begin{split}A\x = \bb 1&2\\3&8 \eb\bb x_1\\x_2 \eb = \bb 0\\0 \eb \rm{\ yields\ } \bb 1&2\\0&2 \eb\bb x_1\\x_2 \eb = \bb 0\\0 \eb \rm{\ and\ } \bb x_1=0\\x_2=0 \eb.\end{split}\]

\(A\) is invertible. There are no special solutions. Both columns of this matrix have pivots.

The rectangular matrix \(B\) has the same nullspace \(\bs{Z}\). The extra rows impose more conditions on the vectors \(\x\) in the nullspace.

The rectangular matrix \(C\) has extra columns instead of extra rows. The solution vector \(\x\) has four components. Elimination will produce pivots in the first two columns of \(C\), but the last two columns of \(C\) and \(U\) are “free”. They don’t have pivots:

\[ \begin{align}\begin{aligned}\begin{split}C=\bb 1&2&2&4\\3&8&6&16 \eb\rm{\ becomes\ } U=\bb 1&2&2&4\\0&2&0&4 \eb\end{split}\\\uparrow\quad\uparrow\quad\uparrow\quad\uparrow\;\;\\\rm{pivot}\quad\;\rm{free}\;\;\end{aligned}\end{align} \]

For the free variables \(x_3\) and \(x_4\), we make special choices of ones and zeros. First \(x_3=1, x_4=0\) and second \(x_3=0, x_4=1\). We get two special solutions in the nullspace of \(C\) which is also the nullspace of \(U\):

\[\begin{split}\bs{s_1}=\bb -2\\0\\1\\0 \eb\rm{\ and\ }\bs{s_2}=\bb 0\\-2\\0\\1 \eb \begin{matrix} \leftarrow\rm{pivot}\\ \leftarrow\rm{variables}\\ \leftarrow\rm{free}\\ \leftarrow\rm{variables} \end{matrix}\end{split}\]

The Reduced Row Echelon Form \(R\)

Note

  1. Produce zeros above the pivots. Use pivot rows to eliminate upward in \(R\).

  2. Produce ones in the pivots. Divide the whole pivot row by its pivot.

The nullspace stay the same: \(\N(A)=\N(U)=\N(R)\). This nullspace becomes easiest to see when we reach the reduced row echelon form \(R = \rm{rref}(A)\). The pivot columns of \(R\) contains \(I\).

Note

Reduced form \(R\): \(U=\bb 1&2&2&4\\0&2&0&4 \eb\) becomes \(R=\bb 1&0&2&0\\0&1&0&2 \eb\).

Now (free column 3) = 2 (pivot column 1), so -2 appears in \(\bs{s}_1=(-2,0,1,0)\). Second special solution \(\bs{s}_2=(0,-2,0,1)\).

The case of a zero nullspace \(\bs{\rm{Z}}\) is of the greatest importance. It says that the columns of \(A\) are independent.

Pivot Variables and Free Variables in the Echelon Matrix \(R\):

\[\begin{split}A=\bb p&p&f&p&f\\|&|&|&|&|\\|&|&|&|&|\\|&|&|&|&|\\ \eb \quad R=\bb 1&0&a&0&c\\0&1&b&0&d\\0&0&0&1&e\\0&0&0&0&0\\ \eb \quad \bs{s}_1=\bb -a\\-b\\1\\0\\0 \eb \quad \bs{s}_2=\bb -c\\-d\\0\\-e\\1 \eb\end{split}\]
  • \(A\): 3 pivot columns \(p\), 2 free columns \(f\) to be revealed by \(R\).

  • \(R\): \(I\) in pivot columns, \(F\) in free columns; 3 pivots: rank \(r=3\).

  • Special \(R\bs{s}_1=\bs{0}\) and \(R\bs{s}_2=\bs{0}\) take \(-a\) to \(-e\) from \(R\); \(R\bs{s}=\bs{0}\) means \(A\bs{s}=\bs{0}\).

Here are those steps for a 4 by 7 reduced row echolon matrix \(R\) with three pivots:

Note

\(R=\bb 1&0&x&x&x&0&x\\0&1&x&x&x&0&x\\0&0&0&0&0&1&x\\0&0&0&0&0&0&0 \eb\)

  • Three pivot variables \(x_1, x_2, x_6\)

  • Four free variables \(x_3, x_4, x_5, x_7\)

  • Four special solutions \(s\) in \(N(R)\)

  • The pivot rows and columns contain \(I\)

The column space \(\bs{C}(R)\) consists of all vectors of the form \((b_1,b_2,b_3,0)\). The nullspace \(\bs{N}(R)\) is a subspace of \(\R^7\). The solutions to \(R\x=\0\) are all the combinations of the four special solutions – one for each free variable:

  1. Columns 3, 4, 5, 7 have no pivots. So the four free variables are \(x_3, x_4, x_5, x_7\).

  2. Set one free variable to 1 and set the other three free variables to zero.

  3. To find \(\bs{s}\), solve \(R\bs{x}=\bs{0}\) for the pivot variables \(x_1, x_2, x_6\).

Note

Suppose \(A\x=\0\) has more unknowns than equations (\(\bs{n}>\bs{m}\), more columns than rows). There must be at least one free column. Then math:Ax=0 has nonzero solutions.

A short vide matrix (\(n>m\)) always has nonzero vectors in its nullspace. There must be at least \(n-m\) free variables, since the number of pivots cannot exceed \(m\). A row might have no pivot – which means an extra free variable. When there is a free variabl, it can be set to 1. Then the equation \(A\x=\0\) has at least a line of nonzero solutions.

The nullspace is a subspace. Its “Dimension” is the number of free variables.

The Rank of a Matrix

The true size of \(A\) is given by its rank.

Note

DEFINITION OF RANK: The rank of \(A\) is the number of pivots. This number is \(r\).

Every “free column” is a combination of earlier pivot columns.

Rank One

Matrices of rank one have only one pivot. Every row is a multiple of the pivot row.

Rank one matrix:

\[\begin{split}A=\bb 1&3&10\\2&6&20\\3&9&30 \eb\rightarrow R=\bb 1&3&10\\0&0&0\\0&0&0 \eb.\end{split}\]

\(A =\) column times row \(= \u\v^T\):

\[\begin{split}\bb 1&3&10\\2&6&20\\3&9&30 \eb=\bb 1\\2\\3 \eb\bb 1&3&10 \eb.\end{split}\]

With rank one \(A\x=\0\) is easy to understand. That equation \(\u(\v^T\x)=\0\) leads us to \(\v^t\x=0\). All vectors \(\x\) in the nullspace must be orthogonal to \(\v\) in the row space. This is the geometry when \(r=1\): row space = line, null space = perpendcular plane.

The second definition of rank: the number of independent rows. This is also the number of independent columns.

The third definition of rank: the “dimention” of the column space. It is also the dimension of the row space. \(n-r\) is the dimension of the nullspace.

Every \(m\) by \(n\) matrix of rank \(r\) reduces to (\(m\) by \(r\)) times (\(r\) by \(n\)):

Note

\(A=(\rm{pivot\ columns\ of\ }A) (\rm{first\ }r\rm{\ rows\ of\ }R)=(\bs{\rm{COL}})(\bs{\rm{ROW}})\).

Elimination: The Big Picture

Question 1 Is this column a combination of previous columns?

If the column contains a pivot, the answer is no. Pivot columns are “independent” of previous columns. If column 4 has no pivot, it is a combination of columns 1, 2, 3.

Question 2 Is this row a combination of previous rows?

If the row contains a pivot, th answer is no. Pivot rows are “independent” of previous rows. If row 3 ends up with no pivot, it is a zero row and it is moved to the bottom of \(R\).

In other words, \(R\) tells us the special solutions to \(A\x=\0\). \(R\) reveals a “basis” for three fundamental subspaces:

The column space of \(A\)–choose the pivot columns of \(A\) as a basis.

The row space of \(A\)–choose the nonzero rows of \(R\) as a basis.

The nullspace of \(A\)–choose the speial solutions to \(R\x=\0\) (and \(A\x=\0\)).

We learn from elimination the single most important number–the rank \(\bs{r}\). That number counts the pivot columns and the pivot rows. Then \(n-r\) counts the free columns and the special solutions.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 3.3 The Complete Solution to \(Ax=b\)

The last section totally solved \(A\x=\0\). Elimination converted the problem to \(R\x=\0\). Now the right side \(\b\) is not zero. \(A\x=\b\) is reduced to a simple system \(R\x=\bs{d}\) with the same solutions. One way to organize that is to add \(\b\) as an extra column of the matrix.

\[\begin{split}\bb 1&3&0&2\\0&0&1&4\\1&3&1&6 \eb\bb x_1\\x_2\\x_3\\x_4 \eb=\bb 1\\6\\7 \eb \quad\begin{matrix} \rm{has\ the}\\\rm{augmented}\\\rm{matrix} \end{matrix} \quad\bb 1&3&0&2&1\\0&0&1&4&6\\1&3&1&6&7 \eb=\bb A&\b \eb\end{split}\]

When we apply the usual elimination steps to \(A\), reacehing \(R\), we also apply them to \(\b\).

\[\begin{split}\bb 1&3&0&2\\0&0&1&4\\0&0&0&0 \eb\bb x_1\\x_2\\x_3\\x_4 \eb=\bb 1\\6\\0 \eb \quad\begin{matrix} \rm{has\ the}\\\rm{augmented}\\\rm{matrix} \end{matrix} \quad\bb 1&3&0&2&1\\0&0&1&4&6\\0&0&0&0&0 \eb=\bb R&\bs{d} \eb\end{split}\]

The very last zero is crucial. The third equation has become \(0=0\). So the equations can be solved.

Here are the same augmented matrices for a general \(\b = (b_1,b_2,b)\):

\[\begin{split}\bb A&\b \eb=\bb 1&3&0&2&b_1\\0&0&1&4&b_2\\1&3&1&6&b_3 \eb\rightarrow \bb 1&3&0&2&b_1\\0&0&1&4&b_2\\0&0&0&0&b_3-b_1-b_2 \eb=\bb R&\bs{d} \eb\end{split}\]

Now we get \(0=0\) in the third equation only of \(b_3-b_1-b_2=0\). This is \(b_1+b_2=b_3\).

One Particular Solution \(Ax_p=b\)

For an easy solution \(\x_p\), choose the free variables to be zero: \(x_2=x_4=0\). Then the two nonzero equations give the two pivot variables \(x_1=1\) and \(x_3=6\).

Tip

For a solution to exist, zero rows in \(R\) must also be zero in \(\bs{d}\). Since \(I\) is in the pivot rows and pivot columns of \(R\), the pivot variables in \(x_{\rm{particular}}\) come from \(\bs{d}\):

  • \(R\x_p=\bb 1&3&0&2\\0&0&1&4\\0&0&0&0 \eb\bb 1\\0\\6\\0 \eb=\bb 1\\6\\0 \eb\)

Notice how we choose the free variables (as zero) and solve for the pivot variables. When the free variables are zero, the pivot variables for \(\x_p\) are already seen in the right side vector \(\bs{d}\).

Note

\(\x_{\rm{particular}}\): The particular solution solves \(A\x_p=\b\).

\(\x_{\rm{nullspace}}\): The \(n-r\) special solutions solve \(A\x_n=\0\).

That particular solution is \((1,0,6,0)\). The two special (nullspace) solutions to \(R\x=\0\) come from the two free columns of \(R\), by reversing signs of 3, 2, and 4.

Note

Complete solution one \(\x_p\) many \(\x_n\):

  • \(\x=\x_p+\x_n=\bb 1\\0\\6\\0 \eb+x_2\bb -3\\1\\0\\0 \eb+x_4\bb -2\\0\\-4\\1 \eb\).

Question: Suppose \(A\) is a square invertible matrix, \(m=n=r\). Whar are \(\x_p\) and \(\x_n\)?

Answer: The particular solution is the one and only solution \(\x_p=A^{-1}\b\). There are no special solutions or free variables. \(R=I\) has no zero rows.

The only vector in the nullspace is \(\x_n=\0\). The complete solution is \(\x=\x_p+\x_n=A^{-1}\b+\0\).

An important case: \(A\) has full column rank. Every column has a pivot. The rank is \(r=n\). The matrix is tall and thin (\(m\geq n\)). Row reduction puts \(I\) at the top, when \(A\) is reduced to \(R\) with rank \(n\):

Full Column Rank:

\[\begin{split}R=\bb I\\0 \eb=\bb n\rm{\ by\ }n\rm{\ identity\ matrix}\\m-n \rm{\ rows\ of\ zeros}\eb.\end{split}\]

There are no free columnso r free variables. The nullspace is \(\bs{\rm{Z}}\) = {zero vector}.

Note

Every matrix \(A\) with full column rank \((r=n)\) has all these properties:

  1. All columns of \(A\) are pivot columns.

  2. There are no free variables or special solutions.

  3. The nullspace \(\bs{N}(A)\) contains only the zero vector \(\x=\0\).

  4. If \(A\x=\b\) has a solution (it might not) then it has only one solution.

In the essential language of the next section, this \(A\) has independent columns. \(A\x=\0\) only happens when \(\x=\0\). The square matrix \(A^TA\) is invertible when the rank is \(n\).

In this case the nullspace of \(A\)) has shrunk to the zero vector. The solution to \(A\x=\b\) is unique (if it exists). There will be \(m-n\) zero rows in \(R\). So there are \(m-n\) conditions on \(\b\) in order to have \(0=0\) in those rows, and \(\b\) in the column space.

Tip

With full column rank, \(A\x=\b\) has one solution or no solution (\(m>n\) is overdetermined).

The Complete Solution

The other extreme case is full row rank. Now \(A\x=\b\) has one or infinitely many solutions. In this case \(A\) must be short and wide \((m\leq n)\). A matrix has full row rank if \(\bs{r=m}\).

Full row rank (rank \(r=m=2\)):

\[ \begin{align}\begin{aligned}x+y+z&=3\\x+2y-z&=4\end{aligned}\end{align} \]

The particular solution will be one point on the line. Adding the nullspace vectors \(\x_n\) will move us along the line. Then \(\x=\x_p+\x_n\) gives the whole line of solutions.

Tip

Complete solution = one particular solution + all nullspace solutions.

\[\begin{split}\bb 1&1&1&3\\1&2&-1&4 \eb\rightarrow\bb 1&1&1&3\\0&1&-2&1 \eb\rightarrow\bb 1&0&3&2\\0&1&-2&1 \eb=\bb R&\bs{d} \eb.\end{split}\]

The particular solution has free variable \(x_3=0\). The special solution has \(x_3=1\):

  • \(\x_{\rm{particular}}\) comes directly from \(\bs{d}\) on the right side: \(\x_p=(2,1,0)\).

  • \(\x_{\rm{special}}\) comes from the third column (free column) of \(R\): \(\bs{s}=(-3,2,1)\).

Note

Complete solution: \(\x=\x_p+\x_n=\bb 2\\1\\0 \eb+x_3\bb -3\\2\\1\eb\).

If \(m<n\) the equation \(A\x=\b\) is underdetermined (many solutions).

Note

Every matrix \(A\) with full row rank \(\bs{r=m}\) has all these properties:

  1. All rows have pivots, and \(R\) has no zero rows.

  2. \(A\x=\b\) has a solution for every right side \(\b\).

  3. The column space is the whole space \(\bs{R}^m\).

  4. There are \(n-r=n-m\) special solutions in the nullspace of \(A\).

In this case with \(m\) pivots, the rows are “linearly independent”. So the columns of \(A^T\) are linearly independent. The nullspace of \(A^T\) is the zero vector.

The four possibilities for linear equations depend on the rank \(\bs{r}\). The reduced \(R\) will fall in the same category as the matrix \(A\). \(F\) is the free part of \(R\):

  1. \(r=m\) and \(r=n\) Square and invertible \(A\x=\b\) has 1 solution: \(R=\bb I \eb\).

  2. \(r=m\) and \(r<n\) Short and wide \(A\x=\b\) has \(\infty\) solution: \(R=\bb I&F \eb\).

  3. \(r<m\) and \(r=n\) Tall and thin \(A\x=\b\) has 0 or 1 solution: \(R=\bb I\\0 \eb\).

  4. \(r<m\) and \(r<n\) Not full rank \(A\x=\b\) has 0 or \(\infty\) solution: \(R=\bb I&F\\0&0 \eb\).

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 3.4 Independece, Basis and Dimension

There are \(n\) columns in an \(m\) by \(n\) matrix. But the true “dimension” of the column space is not necessarily \(n\). The dimension is measured by counting independent columns. The true dimension of the column space is the rank \(r\).

The goal is to understand a basis: independent vectors that “span the space”.

Tip

Every vector in the space is a unique combination of the basis vectors.

Note

  1. Independent vectors: no extra vectors.

  2. Spanning a space: enough vectors to produce the rest.

  3. Basis for a space: not too many or too few.

  4. Dimension of a space: the number of vectors in a basis.

Linear Independece

Note

DEFINITION: The columns of \(A\) are linearly independent when the only solution to \(A\x=\0\) is \(\x=\0\). No other combination \(A\x\) of the columns gives the zero vector.

The columns are independent when the nullspace \(\bs{N}(A)\) contains only the zero vector.

Note

DEFINITION: The sequence of vectors \(\v_1,\cds,\v_n\) is linearly independent if the only combination that gives the zero vector is \(0\v_1+0\v_2+\cds+0\v_n\).

  • \(x_1\v_1+x_2\v_2+\cds+x_n\v_n=\0\) only happens when all \(x\)’s are zero.

If a combination gives \(\0\), when the \(x\)’s are not all zero, the vectors are dependent.

Note

Full column rank: The columns of \(A\) are independent exactly when the rank is \(r=n\). There are \(n\) pivots and no free variables. Only \(\x=\0\) is in the nullspace.

Note

Any set of \(n\) vectors in \(\R^m\) must be linearly dependent if \(n>m\).

The columns might be dependent or might be independent if \(n\leq m\). Elimination will reveal the \(r\) pivot columns. It is those \(r\) pivot columns that are independent.

Vectors that Span a Subspace

The column space consists of all combinations \(A\x\) of the columns.

Note

DEFINITION: A set of vectors spans a space if their linear combinations fill the space.

Tip

The columns of a matrix span its column space. They might be dependent.

The combinations of the rows produce the “row space”.

Note

DEFINITION: The row space of a matrix is the subspace of \(\R^n\) spannede by the rows. The row space of \(A\) is \(\bs{C}(A^T)\). It is the column space of \(A^T\).

A Basis for a Vector Space

We want enough independent vectors to span the space (and not more).

Note

DEFINITION: A basis for a vector space is a sequence of vectors with two properties:

  • The basis vectors are linearly independent and they span the space.

Tip

There is one and only one way to write \(\v\) as a combination of the basis vectors.

Suppose \(\v=a_1\v_1+\cds+a_n\v_n\) and also \(\v=b_1\v_1+\cds+b_n\v_n\). By subtraction \((a_1-b_1)\v_1+\cds+(a_n-b_n)\v_n\) is the zero vector. From the independence of the \(\v\)’s, each \(a_i-b_i=0\). Hence \(a_i=b_i\), and there are not two ways to produce \(\v\).

The columns of the \(n\) by \(n\) identity matrix give the “standard basis” for \(\R^n\).

The columns of every invertible \(n\) by \(n\) matrix give a basis for \(\R^n\):

  • Invertible matrix: Independent columns; Column space is \(\R^3\):

    \[\begin{split}A=\bb 1&0&0\\1&1&0\\1&1&1 \eb.\end{split}\]
  • Singular matrix: Dependent columns; Column space \(\neq \R^3\):

    \[\begin{split}B=\bb 1&0&1\\1&1&2\\1&1&2 \eb.\end{split}\]

The only solution to \(A\x=\0\) is \(\x=A^{-1}\0=\0\). The columns are independent. They spsan the whole space \(\R^n\)–because every vector \(\b\) is a combination of the columns. \(A\x=\b\) can always be solved by \(\x=A^{-1}\b\).

Note

The vectors \(\v_1,\cds,\v_n\) are a basis for \(\R^n\) exactly when they are the columns of an \(n\) by \(n\) invertible matrix. Thus \(\R^n\) has infinitely many different bases.

When the columns are dependent. we keep only the pivot columns–the first two columns of \(B\) above, with its two pivots. They are independent and they span the column space.

Note

The pivot columns of \(A\) are a basis for its column space. The pivot rows of \(A\) are a basis for its row space. So are the pivot rows of its echelon form \(R\).

Note

Question: Given five vectors in \(\R^7\), how do you find a basis for the space they span?

First answer: Make them the rows of \(A\), and eliminate to find the nonzero rows of \(R\).

Second answer: Put the five vectors into the columns of \(A\). Eliminate to find the pivot columns (of \(A\) not \(R\)). Those pivot columns are a basis for the column space.

All bases for a vector space contain the same number of vectors.

Tip

The number of vectors, in any and every basis, is the “dimnesion” of the space.

Dimension of a Vector Space

Note

If \(\v_1,\cds,\v_m\) and \(\w_1,\cds,\w_n\) are both bases for the same vector space, then \(m=n\).

Proof: Suppose that there are more \(\w\)’s than \(\v\)’s. From \(n>m\) we want to reach a contradiction. The \(\v\)’s are a basis, so \(\w_1\) must be a combination of the \(\v\)’s. If \(\w_1\) equals \(a_{11}\v_1+\cds+a_{m1}\v_m\), this is the first column of a matrix multiplication \(VA\):

Each \(\w\) is a combination of the \(\v\)‘s:

\[\begin{split}W=\bb &&&\\\w_1&\w_2&\cds&\w_n\\&&& \eb=\bb &&\\\v_1&\cds&\v_m\\&& \eb \bb a_{11}&&a_{1n}\\\vdots&&\vdots\\a_{m1}&&a_{mn} \eb=VA.\end{split}\]

We don’t know each \(a_{ij}\), but we know the shape of \(A\) (it is \(m\) by \(n\)). The second vector \(\w_2\) is also a combination of the \(\v\)’s. The coefficients in that combination fill the second column of \(A\). The key is that \(A\) has a row for every \(\v\) and a column for every \(\w\). \(A\) is a shrot wide matrix, since we assumed \(n>m\). So \(A\x=\0\) has a nonzero solution.

\(A\x=\0\) gives \(VA\x=\0\) which is \(W\x=\0\). A combination of the \(\w\)‘s gives zero! Then the \(\w\) is not possible for two bases.

If \(m>n\) we exchange the \(\v\)’s and \(\w\)’s and repeat the same steps. The only way to avoid a contradiction is to have \(m=n\). This completes the proof that \(m=n\).

The number of basis vectors depends on the space–not on a particular basis. The number is the same for every basis, and it counts the “degrees of freedom” in the space. The dimension of the space \(\R^n\) is \(n\).

Note

DEFINITION: The dimension of a space is the number of vectors in every basis.

Bases for Matrix Spaces and Function Spaces

Matrix spaces: The vector space \(\bs{M}\) contains all 2 by 2 matrices. Its dimension is 4.

One basis is:

\[\begin{split}A_1,A_2,A_3,A_4=\bb 1&0\\0&0 \eb,\bb 0&1\\0&0 \eb,\bb 0&0\\1&0 \eb,\bb 0&0\\0&1 \eb.\end{split}\]

Those matrices are linearly independent. We are not looking at their columns, but at the whole matrix. Combinations of those four matrices can produce any matrix in \(\bs{M}\), so they space the space:

Every \(A\) combines the basis matrices:

\[\begin{split}c_1A_1+c_2A_2+c_3A_3+c_4A_4=\bb c_1&c_2\\c_3&c_4 \eb=A.\end{split}\]

\(A\) is zero only if the \(c\)’s are all zero–this proves independece of \(A_1,A_2,A_3,A_4\).

The three matrices \(A_1,A_2,A_4\) are a basis for a subspace–the upper triangular matrices. Its dimension is 3. \(A_1\) and \(A_4\) are a basis for the diagonal matrices.

To push this further, think about the space of all \(n\) by \(n\) matrices. One possible basis uses matrices that have only a single nonzero entry (that entry is 1). There are \(n^2\) positions for that 1, so there are \(n^2\) basis matrices:

  • The dimension of the whole \(n\) by \(n\) matrix space is \(n^2\).

  • The dimension of the subspace of upper triangular matrices is \(\frac{1}{2}n^2+\frac{1}{2}n\).

  • The dimension of the subspace of diagonal matrices is \(n\).

  • The dimension of the subspace of symmetric matrices is \(\frac{1}{2}n^2+\frac{1}{2}n\).

Function spaces: The equations \(d^2y/dx^2=0\) and \(d^2y/dx^2=-y\) and \(d^2y/dx^2=y\) involve the second derivative. In calculus we solve to find the functions \(y(x)\):

  • \(y\ppr=0\) is solved by any linear function \(y=cx+d\).

  • \(y\ppr=-y\) is solved by any combination \(y=c\sin x+d\cos x\).

  • \(y\ppr=y\) is solved by any combination \(y=ce^x+de^{-x}\).

That solution space for \(y\ppr=-y\) has two basis functions: \(\sin x\) and \(\cos x\). The space for \(y\ppr=0\) has \(x\) and 1. It is the “nullspace” of the second derivative! The dimension is 2 ineach case (these are second-order equations).

The dimension of the space \(\bs{\rm{Z}}\) is zero. The empty set (containing no vectors) is a basis for \(\bs{\rm{Z}}\). We can never allow the zero vector into a basis, because then linear independence is lost.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 3.5 Dimensions of the Four Subspaces

The rank of a matrix is the number of pivots. The dimension of a subspace is the number of vectors in a basis. The rank of \(A\) reveals the dimensions of all four fundamental subspaces.

Two subspaces come directly from \(A\), and the other two from \(A^T\):

Note

Four Fundamental Subspaces:

  1. The row space is \(\bs{C}(A^T)\), a subspace of \(\R^n\).

  2. The column space is \(\bs{C}(A^T)\), a subspace of \(\R^n\).

  3. The nullspace is \(\bs{C}(A^T)\), a subspace of \(\R^n\).

  4. The left nullspace is \(\bs{C}(A^T)\), a subspace of \(\R^n\).

The row space of \(A\) is the column space of \(A^T\).

Four the left nullspace we solve \(A^T\y=\0\)–that system is \(n\) by \(m\). This is the nullspace of \(A^T\).

Part 1 of the Fundamental Theorem finds the dimensions of the four subspaces. The row space and column space have the same dimension \(r\). This number \(r\) is the rank of the matrix.

\(\bs{N}(A)\) and \(\bs{N}(A^T)\) have dimensions \(n-r\) and \(m-r\), to make up the full \(n\) and \(m\).

Part 2 of the Fundamental Theorem will describe how the four subspaces fit together.

The Four Subspaces for \(R\)

Suppose \(A\) is reduced to its row echelon form \(R\). The main point is that the four dimensions are the smae for \(A\) and \(R\).

As a specific 3 by 5 example, look at the four subspaces for this echelon matrix \(R\):

\[\begin{split}R = \bb 1&3&5&0&7\\0&0&0&1&2\\0&0&0&0&0 \eb.\end{split}\]

The rank of this matrix is \(r=2\) (two pivots).

Note

1. The row space of \(R\) has dimension 2, matching the rank.

The first two rows are a basis. Rows 1 and 2 span the row space \(\bs{C}(R^T)\).

The pivot rows 1 and 2 are independent. If we look only at the pivot columns, we see the \(r\) by \(r\) identity matrix. So the \(r\) pivot rows are a basis for the row space.

Tip

The dimension of the row space is the rank \(r\). The nonzero rows of \(R\) form a basis.

Note

2. The column space of \(R\) also has dimension \(r=2\).

The pivot columns 1 and 4 form a basis for \(\bs{C}(R)\). They are independent because they start with the \(r\) by \(r\) identity matrix. Every other (free) column is a combination of the pivot columns.

Column 2 is 3 (column 1). The special solution is \((-3,1,0,0,0)\).

Column 3 is 5 (column 1). The special solution is \((-5,0,1,0,0)\).

Column 5 is 7 (column 1) + 2 (column 4). The special solution is \((-7,0,0,-2,1)\).

Tip

The dimension of the column space is the rank \(r\). The pivot columns form a basis.

Note

3. The nullspace of \(R\) has dimension \(n-r=5-2\). There are \(n-r=3\) free variables. Here \(x_2,x_3,x_5\) are free (no pivots in those columns). They yield the three special solutions to \(R\x=\0\). Set a free variable to 1, and solve for \(x_1\) and \(x_4\).

\[\begin{split}\bs{s}_2 = \bb -3\\1\\0\\0\\0 \eb \quad \bs{s}_3 = \bb -5\\0\\1\\0\\0 \eb \quad \bs{s}_5 = \bb -7\\0\\0\\-2\\1 \eb \quad\end{split}\]

With \(n\) variables and \(r\) pivots, that leaves \(n-r\) free variables and special solutions. The special solutions are independent, because they contain the identity matrix in rows 2, 3, 5. So \(\bs{N}(R)\) has dimension \(n-r\).

Tip

The nullspace has dimension \(n-r\). The special solutions form a basis.

Note

4. The nullspace of \(R^T\) (left nullspace of \(R\)) has dimension \(m-r=3-2\).

The equation \(R^T\y=\0\) looks for combinations of the columns of \(R^T\) (the rows of \(R\)) that produce zero. The nullspace of \(R^T\) contains all vectors \(\y=(0,0,y_3)\).

Tip

If \(A\) is \(m\) by \(n\) of rank \(r\), its left nullsapce has dimension \(m-r\).

Note

In \(\R^n\) the row space and nullspace have dimensions \(r\) and \(n-r\) (adding to \(n\)).

In \(\R^m\) the column space and left nullspace have dimensions \(r\) and \(m-r\) (total \(m\)).

The Four Subspaces for \(A\)

The subspace dimensions for \(A\) are the same as for \(R\).

This \(A\) reduces to \(R\) (Notice \(\bs{C}(A)\neq\bs{C}(R)\)!):

\[\begin{split}A=\bb 1&3&5&0&7\\0&0&0&1&2\\1&3&5&1&9 \eb\end{split}\]

1. \(A\) has the same row space as \(R\). Same dimension \(r\) and same basis.

Every row of \(A\) is a combination of the rows of \(R\) and vice versa. Elimination changes rows, but not row spaces. The good \(r\) rows of \(A\) are the ones that end up as pivot rows in \(R\).

2. The column space of \(A\) has dimension \(r\). The column rank equals the row rank.

Tip

Rank Theorem: The number of independent columns = the number of independent rows.

The same combinations of the columns are zero (or nonzero) for \(A\) and \(R\). Dependent in \(A \Leftrightarrow\) depedent in \(R\). \(A\x=\0\) exactly when \(R\x=\0\). The column spaces are different, but their dimensions are the same–equal to \(r\).

The \(r\) pivot columns of \(A\) are a basis for its column space \(\bs{C}(A)\).

3. \(A\) has the same nullsapce as \(R\). Same dimension \(n-r\) and same basis.

The elimination steps don’t change the soluions. The special solutions are a basis for this nullspace (As we always knew). There are \(n-r\) free variables, so the dimension of the nullsapce is \(n-r\). This is the Counting Theorem: \(r+(n-r)=n\).

Note

(dimension of column space) + (dimension of nullspace) = dimension of \(\R^n\).

4. The left nullspace of \(A\) (the nullspace of \(A^T\)) has dimension \(m-r\).

Note

Fundamental Theorem of Linear Algebra, Part 1:

  • The column space and row space both have dimension \(r\).

  • The nullspaces have dimensions \(n-r\) and \(m-r\).

Graph are the most important model in discrete applied mathematics.

Rank One Matrics (Review)

Note

Every rank one matrix is one column times one row \(A=\u\v^T\).

Rank Two Matrices = Rank One plus Rank One

Some elimination matrix \(E\) simplifies \(A\) to \(\bs{EA=R}\). Then the inverse matrix \(C=E^{-1}\) connects \(R\) back to \(\bs{A+CR}\).

\(R\) has the same row space as \(A\):

\[\begin{split}A=\bb 1&0&3\\1&1&7\\4&2&20 \eb=\bb 1&0&0\\1&1&0\\4&2&1 \eb\bb 1&0&3\\0&1&4\\0&0&0 \eb=CR.\end{split}\]

Matrix \(A\) rank two:

\[\begin{split}A=\bb &&\\\u_1&\u_2&\u_3\\&& \eb\bb &\v_1^T&\\&\v_2^T&\\&\rm{zero\ row}&\eb =\u_1\v_1^T+\u_2\v_2^T=(\rm{rank\ }1)+(\rm{rank\ }1)\end{split}\]

Every rank \(r\) matrix is a sum of \(r\) rank one matrices: Pivot columns of \(A\) times nonzero rows of \(R\). The row \(\bb 0&0&0 \eb\) simply disappeared.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 4 Orthogonality

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 4.1 Orthogonality of the Four Subspaces

Two vectors are orthogonal when their dot product is zero: \(\v\cd\w=\v^T\w=0\). This chapter moves to orthogonal subspaces and orthogonal bases and orthogonal matrices. Thank of \(a^2+b^2=c^2\) for a right triangle with sides \(\v\) and \(\w\).

Note

Orthogonal vectors: \(\v^T\w=0\) and \(\lv\v\rv^2+\lv\w\rv^2=\lv\v+\w\rv^2\).

The right side is \((\v+\w)^T(\v+\w)\). This equals \(\v^T\v+\w^T\w\) when \(\v^T\w=\w^T\v=0\).

The row space is perpendicular to the nullsapce. Every row of \(A\) is perpendicular to every solution of \(A\x=\0\). This perpendicularity of subspaces is Part2 of the Fundamental Theorem of Linear Algebra.

The column space is perpendicular to the nullspace of \(A^T\). When \(\b\) is outside the column space–when we want to solve \(A\x=\b\) and con’t do it–then this nullspace of \(A^T\) comes into its own. It contains the error \(\e=\b-A\x\) in the “least-squares” solution. Least squares is the key application of linear algebra in this chapter.

The row space and nullspace are orthogonal subspaces inside \(\R^n\).

DEFINITION: Two subspaces \(\bs{V}\) and \(\bs{W}\) of a vector space are orthogonal if every vector \(\v\) in \(\bs{V}\) is perpendicular to every vector \(\w\) in \(\bs{W}\):

Note

Orthogonal subspaces: \(\v^T\w=0\) for all \(\v\) in \(\bs{V}\) and all \(\w\) in \(\bs{W}\).

Note

Every vector \(\x\) in the nullspace is perpendicular to every row of \(A\), because \(A\x=\0\). The nullspace \(\bs{N}(A)\) and the row space \(\bs{C}(A^T)\) are orthogonal subspaces of \(\R^n\).

Look at \(A\x=\0\). Each row multiplies \(\x\):

\[\begin{split}A\x=\bb &\rm{row\ }1&\\&\vdots&\\&\rm{row\ }m& \eb\bb \\ \x \\ \ \eb =\bb 0\\\vdots\\0 \eb \quad \begin{matrix} \leftarrow \\ \\ \leftarrow \end{matrix} \quad \begin{matrix} (\rm{row\ }1)\cd\x \rm{\ is\ zero} \\ \\ (\rm{row\ }m)\cd\x \rm{\ is\ zero} \end{matrix}\end{split}\]

The first equation says that row 1 is perpendicular to \(\x\). The last equation says that row \(m\) is perpendicular to \(\x\). Every row has a zero dot product with \(\x\). Then \(\x\) is also perpendicular to every combination of the rows. The whole row space \(\bs{C}(A^T)\) is orthogonal to \(\bs{N}(A)\).

The vectors in the row space are combinations \(A^T\y\) of the rows. Take the dot product of \(A^T\y\) with any \(\x\) in the nullspace. These vectors are perpendicular.

Nullspace orthogonal to row space:

\[\x^T(A^T\y) = (A\x)^T\y = \0^T\y =0.\]

Note

Every vector \(\y\) in the nullspace of \(A^T\) is perpendicular to every column of \(A\). The left nullspace \(\bs{N}(A^T)\) and the column space \(\bs{C}(A)\) are orthogonal in \(\R^m\).

Orthogonal Complements

Important: The fundamental subspaces are more than just orthogonal (in pairs). Their dimensions are also right. Two lines could be perpendicular in \(\R^3\), but those lines could not be the row space and nullspace of a 3 by 3 matrix.

DEFINITION: The orthogonal complement of a subspace \(\bs{V}\) contains every vector that is perpendicular to \(\bs{V}\). This orthogonal subspace is denoted by \(\bs{V}^{\perp}\).

By this definition, the nullspace is the orthogonal complement of the row space. Every \(\x\) that is perpendicular to the rows satisfies \(A\x=\0\), and lies in the nullspace. The reverse is also true. If \(\v\) is orthogonal to the nullspace, it must be in the row space.

Note

Fundamental Theorem of Linear Algebra, Part 2:

  • \(\bs{N}(A)\) is the orthogonal complement of the row space \(\bs{C}(A^T)\) (in \(\R^n\)).

  • \(\bs{N}(A^T)\) is the orthogonal complement of the column space \(\bs{C}(A)\) (in \(\R^m\)).

Part 1 gave the dimensions of the subspaces. Part 2 gives the \(90^{\circ}\) angles between them. The point of “complements” is that every \(\x\) can be split into a row space component \(\x_r\) and a nullspace component \(\x_n\). When \(A\) multiplies \(\x=\x_r=\x_n\):

  • The nullspace omponent goes to zero: \(A\x_n=\0\).

  • The row space component goes to the column space: \(A\x_r=A\x\).

Every vector \(\b\) in the column space comes from one and only one vector \(\x_r\) in the row space.

Proof: If \(A\x_r=A\x^{\pr}_r\), the difference \(\x_r-\x^{\pr}_r\) is in the nullspace. It is also in the row space, where \(\x_r\) and \(\x^{\pr}_r\) came from. This difference must be the zero vector, because the nullspace and row space are perpendicular. Therefore \(\x_r = \x^{\pr}_r\).

There is an \(r\) by \(r\) invertible matrix hiding inside \(A\), if we throw away the two nullspaces. From the row space to the column space, \(A\) is invertible.

Every matrix can be diagonalized, when we choose the right bases for \(\R^n\) and \(\R^m\). This Singular Value Decomposition has become extremely important in applications.

A row of \(A\) can’t be in the nullspace of \(A\) (except for a zero row). The only vector in two orthogonal subspaces is the zero vector. If a vector \(\v\) is orthogonal to itself then \(\v\) is the zero vector.

Drawing the Big Picture

Refer to the textbook Page 199.

Combining Bases from Subspaces

Note

  • Any \(n\) independent vectors in \(\R^n\) must span \(\R^n\). So they are a basis.

  • Any \(n\) vectors that span \(\R^n\) must be independent. So they are a basis.

Note

  • If the \(n\) columns of \(A\) are independent, they space \(\R^n\). So \(A\x=\b\) is solvable.

  • If the \(n\) columns span \(\R^n\), they are independent. So \(A\x=\b\) has only one solution.

Uniqueness implies existence and existence implies uniqueness. Then \(A\) is invertible.If there are no free variables, the solution \(\x\) is unique. There must be \(n\) pivot columns. Then back substitution solves \(A\x=\b\) (the solution exists).

Starting in the opposite direction, suppose that \(A\x=\b\) can solved for every \(\b\) (existence of solutions). Then elimination produced no zero rows. There are \(n\) pivots and no free variables. The nullspace contains only \(\x=\0\) (uniqueness of solutions).

With bases for the row space and the nullspace, we have \(r+(n-r)=n\) vectors. This is the right number. Those \(n\) vectors are independent. Therefore they span \(\R^n\).

Tip

Each \(\x\) is the sum \(\x_r+\x_n\) of a row space vector \(\x_r\) and a nullspace vector \(\x_n\).

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 4.2 Projections

The projection of \(\b\) is \(P\b\).

  1. What are the projection of \(\b=(2,3,4)\) onto the \(z\) axis and the \(xy\) plane?

  2. What matrices \(P_1\) and \(P_2\) produce those projections onto a line and a plane?

When \(\b\) is projected onto a line, its projection \(p\) is the part of \(\b\) along that line. If \(\b\) is projected onto a plane, \(\p\) is the part in that plane. The projection \(\p\) is \(P\b\).

The projection matrix \(P\) multiplies \(\b\) to give \(\p\).

The projection onto the \(z\) axis we call \(\p_1\). The second projection drops straight down to the \(xy\) plane. Start with \(\b=(2,3,4)\). The projection across gives \(\p_1=(0,0,4)\). The projection down gives \(\p_2=(2,3,0)\). Those are the parts of \(\b\) along the \(z\) axis and in the \(xy\) plane.

Projection Matirx Onto the \(z\) axis:

\[\begin{split}P_1 = \bb 0&0&0\\0&0&0\\0&0&1 \eb.\end{split}\]

Projection Matirx Onto the \(xy\) plane:

\[\begin{split}P_2 = \bb 1&0&0\\0&1&0\\0&0&0 \eb.\end{split}\]

\(P_1\) picks out the \(z\) component of every vector. \(P_2\) picks out the \(x\) and \(y\) components.

\[\begin{split}\p_1=P_1\b=\bb 0&0&0\\0&0&0\\0&0&1 \eb\bb x\\y\\z \eb=\bb 0\\0\\z \eb\quad \p_2=P_2\b=\bb 1&0&0\\0&1&0\\0&0&0 \eb\bb x\\y\\z \eb=\bb x\\y\\0 \eb.\end{split}\]

In this case the projections \(p_1\) and \(p_2\) are perpendicular. The \(xy\) plane and the \(z\) axis are orthogonal subspaces. More than just orthogonal, the line and plane are orthogonal complements. The projections \(\p_1\) and \(\p_2\) are exactly those two parts of \(\b\):

  • The vectors give \(\p_1+\p_2=\b\).

  • The matrices give \(P_1+P_2=I\).

Our problem is to project any \(\b\) onto the column space of any \(m\) by \(n\) matrix. Start with a line (dimension \(n=1\)). The matrix \(A\) will have only one column. Call it \(\a\).

Projection Onto a Line

A line goes through the origin in the direction of \(\a=(a_1,\cds,a_m)\). Along that line, we want the point \(\p\) closest to \(\b=(b_1,\cds,b_m)\). The key to projection is orthogonality: The line from \(\b\) to \(\p\) is perpendicular to the vector \(\a\). We now compute \(\p\) by algebra.

The projection \(\p\) will be some multiple of \(\a\). Call it \(\p=\wh{x}\a=\)\(x\) hat” times \(\a\). Computing this number \(\wh{x}\) will give the vector \(\p\). Then from the formula for \(\p\), we will read off the projection matrix \(P\). These three steps will lead to all projection matrices: find \(\wh{x}\), then find the vector \(\p\), then find the matrix \(P\).

\(\b-\p\) is the “error” \(\e=\b-\wh{x}\a\). It is perpendicular to \(\a\)–this will determine \(\wh{x}\). Use the fact that \(\b-\wh{x}\a\) is perpendicular to \(\a\) when their dot product is zero:

Note

Projecting \(\b\) onto \(\a\) with error \(\e=\b-\wh{x}\a\): \(\a\cd(\b-\wh{x}\a)=0\) or \(\a\cd\b-\wh{x}\a\cd\a=0\):

  • \(\dp\wh{x}=\frac{\a\cd\b}{\a\cd\a}=\frac{\a^T\b}{\a^T\a}\).

Note

The projection of \(\b\) onto the line through \(\a\) is the vector: \(\dp\p=\wh{x}\a=\frac{\a^T\b}{\a^T\a}\a\).

  • Special case 1: If \(\b=\a\) then \(\wh{x}=1\). The projection of \(\a\) onto \(\a\) is itself. \(P\a=\a\).

  • Special case 2: If \(\b\) is perpendicular to \(\a\) then \(\a^T\b=0\). The projection is \(\p=\0\).

Look at the right triangle of \(\b,\p,\e\). The vector \(\b\) is split into two parts–its component along the line is \(\p\), its perpendicular part is \(\e\). Those two sides \(\p\) and \(\e\) have length \(\lv\p\rv=\lv\b\rv\cos\theta\) and \(\lv\e\rv=\lv\b\rv\sin\theta\). Trigonometry matches the dot product:

\[\p=\frac{\a^T\b}{\a^T\a}\a\rm{\ has\ length\ }\lv\p\rv= \frac{\lv\a\rv\lv\b\rv\cos\theta}{\lv\a\rv^2}\lv\a\rv=\lv\b\rv\cos\theta.\]

The dot product is alot simpler than getting involved with \(\cos\theta\) and the length of \(\b\).

Note

Projection matrix \(P\): \(\dp\p=\a\wh{x}=\a\frac{\a^T\b}{\a^T\a}=P\b\) when the matrix is \(\dp P=\frac{\a\a^T}{\a^T\a}\).

\(P\) is a column times a row! The column is \(\a\), the row is \(\a^T\). Then divide by the number \(\a^T\a\). The projection matrix \(P\) is \(m\) by \(m\), but its rank is one. We are projecting onto a one-dimensional subspace, the line through \(\a\). That line is the column space of \(P\).

Projecting a second time doesn’t change anything, so \(P^2=P\).

Note that \((I-P)\b=\b-\p=\e\). When \(P\) projects onto one subspace, \(I-P\) projects onto the perpendicular subspace. Here \(I-P\) projects onto the plane perpendicular to \(\a\).

Projection Onto a Subspace

Start with \(n\) vectors \(\a_1,\cds,\a_n\) in \(\R^m\). Assume that these \(\a\)’s are linearly independent.

Problem: Find the combination \(\p=\wh{x}_1\a_1+\cds+\wh{x}_n\a_n\) closest to a given vector \(\b\). We are projecting each \(\b\) in \(\R^m\) onto the subspace spanned by the \(\a\)’s.

With \(n=1\) (one vector \(\a_1\)) this is projection onto a line. The line is the column space of \(A\), which has just one column. In general the matrix \(A\) has \(n\) columns \(\a_1,\cds,\a_n\).

The combination in \(\R^m\) are the vectors \(A\x\) in the column space. We are looking for the particular combination \(\p=A\wh{\x}\). That choise is \(\wh{x}=\a^T\b/\a^T\a\) when \(n=1\).

We compute projections onto \(n\)-dimensional subspaces in three steps as before: Find the vector \(\wh{\x}\), find the projection \(\p=A\wh{\x}\), find the proection matrix \(P\).

The error vector \(\b-A\wh{\x}\) is perpendicular to the subspaces. It makes a right angle with all the vectors \(\a_1,\cds,\a_n\) in the base:

Note

\(\begin{matrix}\a_1^T(\b-A\wh{\x})=0\\\vdots\\\a_n^T(\b-A\wh{\x})=0\end{matrix}\quad\) or \(\quad\bb -&\a_1^T&-\\&\vdots&\\-&\a_n^T&- \eb\bb \\\b-A\wh{\x}\\\ \eb=\bb \\\0\\\ \eb\).

The matrix with those rows \(\a_i^T\) is \(A^T\). The \(n\) equations are exactly \(A^T(\b-A\wh{\x})=\0\).

Note

The combination \(\p=\wh{x}_1\a_1+\cds+\wh{x}_n\a_n=A\wh{\x}\) that is closest to \(\b\) comes from \(\wh{\x}\):

  • Find \(\wh{\x}\ (n \times 1):\quad A^T(\b-A\wh{\x})=\0\quad\rm{or}\quad A^A\wh{\x}=A^T\b\).

This symmetric matrix \(A^TA\) is \(n\) by \(n\). It is invertible if the \(\a\)’s are independent. The solution is \(\wh{\x}=(A^TA)^{-1}A^T\b\). The projection of \(\b\) onto the subspace is \(\p\):

  • Find \(\p\ (m \times 1):\quad \p=A\wh{\x}=A(A^TA)^{-1}A^T\b\).

The next formula picks out the projection matrix that is multiplying \(\b\):

  • Find \(P\ (m \times m):\quad P=A(A^TA)^{-1}A^T\).

Compare with projection onto a line, when \(A\) has only one column: \(A^A\) is \(\a^T\a\).

Note

For \(n=1\): \(\dp\wh{x}=\frac{\a^T\b}{\a^T\a}\) and \(\dp\p=\a\frac{\a^T\b}{\a^T\a}\) and \(\dp P=\frac{\a\a^T}{\a^T\a}\).

The key step was \(A^T(\b-A\wh{\x})=\0\). We used geometry (\(\e\) is orthogonal to each \(\a\)). Linear algebra gives this “normal equation” too, in a very quick and beautiful way:

  1. Our subspace is the column space of \(A\).

  2. The error vector \(\b-A\wh{\x}\) is perpendicular to that column space.

  3. Therefore \(\b-A\wh{\x}\) is in the nullspace of \(A^T\)! This means \(A^T(\b-A\wh{\x})=\0\).

The left nullspace is important in projections. That nullspace of \(A^T\) contains the error vector \(\e-\b-A\wh{\x}\). The vector \(\b\) is being split into the projection \(\p\) and the error \(\e=\b-\p\). Projection produces a right triangle with sides \(\p,\e,\b\).

Warning

The matrix \(P=A(A^TA)^{-1}A^T\) is deceptive. You cannot split \((A^TA)^{-1}\) into \(A^{-1}\) times \((A^T)^{-1}\). The matrix \(A\) is rectangular. It has no inverse matrix.

Tip

\(A^TA\) is invertible if and only if \(A\) has linearly independent columns.

Proof: \(A^TA\) is a square matrix (\(n\) by \(n\)). For every matrix \(A\), we will now show that \(A^TA\) has the same nullsapce as math:A.

Let \(A\) be any matrix. If \(\x\) is in tis nullspace, then \(A\x=\0\). Multiplying by \(A^T\) gives \(A^TA\x=\0\). So \(\x\) is also in the nullsapce of \(A^TA\).

From \(A^TA\x=\0\) we must prove \(A\x=\0\). Since we can’t multiply by \((A^T)^{-1}\), which generally doesn’t exist. Just multiply by \(\x^T\).

\[(\x^T)A^TA\x=0 \quad\rm{or}\quad (A\x)^T(A\x)=0 \quad\rm{or}\quad \lv A\x\rv ^2=0.\]

We have shown: If \(A^TA\x=\0\) then \(A\x\) has length zero. Therefore \(A\x=\0\). Every vector \(\x\) in one nullspace is in the other nullspace. If \(A^TA\) has dependent column, so has \(A\). If \(A^TA\) has independent columns, so has \(A\). This is the good case: \(A^TA\) is invertible.

Tip

When \(A\) has independent columns, \(A^TA\) is square, symmetric, and invertible.

Very brief summary: To find the projection \(\p=\wh{x}_1\a_1+\cds+\wh{x}_n\a_n\), solve \(A^TA\wh{\x}=A^T\b\). This gives \(\wh{\x}\). The projection is \(\p=A\wh{\x}\) and the error is \(\e=\b-p=\b-A\wh{\x}\). The projection matrix \(P=A(A^TA)^{1}A^T\) gives \(\p=P\b\).

This matrix satsfies \(P^2=P\). The distance from \(\b\) to the subspace \(\bs{C}(A)\) is \(\lv\e\rv\).

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 4.3 Least Squares Approximations

It often happens that \(A\x=\b\) has no solution. The usual reason is: too manu equations. The matrix \(A\) has more rows than columns (\(m\) is greater than \(n\)).

We cannot always get the error \(\e=\b-A\x\) down to zero. When \(\e\) is zero, \(\x\) is an exact solution to \(A\x=\b\). When the length of \(\e\) is as small as possible, \(\wh{\x}\) is a least squares solution.

Note

When \(A\x=\b\) has no solution, multiply by \(A^T\) and solve \(A^TA\wh{\x}=A^T\b\).

A crucial application of least squares is ftting a straight line to \(m\) points. Start with three points: Find the closest line to the points \((0,6),(1,0),(2,0)\).

No straight linge \(b=C+Dt\) goes through those three points. This 3 by 2 system has no solution: \(\b=(6,0,0)\) is not a combination of the columns \((1,1,1)\) and \((0,1,2)\).

\(A\x=\b\) is not solvable:

\[\begin{split}A=\bb 1&0\\1&1\\1&2 \eb\quad \x=\bb C\\D \eb\quad \b=\bb 6\\0\\0 \eb.\end{split}\]

We computed \(\wh{\x}=(5,-3)\). Those numbers are the best \(C\) and \(D\), so \(5-3t\) will be the bset line for the 3 points.

Minimizing the Error

How do we make the error \(\e=\b-A\x\) as small as possible?

By geometry: Every \(A\x\) lies in the plane of the columns \((1,1,1)\) and \((0,1,2)\). The nearest point to \(\b\) is the projection \(\p\).

The best choice for \(A\x\) is \(\p\). The smallest possible error is \(\e=\b-\p\), perpendicular to the columns. The three points at heights \((p_1,p_2,p_3)\) do lie on a line, because \(\p\) is in the column space of \(A\). In fitting a straight line, \(\wh{\x}\) is the best choice for \((C,D)\).

By algebra: Every vector \(\b\) splits into two parts. The part in the column space is \(\p\). The perpendicular part is \(\e\). There is an equation we cannot sove (\(A\x=\b\)). There is an equation \(A\wh{\x}=\p\) we can and do solve:

\[A\x=\b=\p+\e \rm{\ is\ impossible}\quad A\wh{\x}=\p \rm{\ is\ solvable} \quad \wh{\x} \rm{\ is\ } (A^TA)^{-1}A^T\b.\]

The solution to \(A\wh{\x}=\p\) leaves the least possible error (which is \(\e\)):

Squared length for any \(\x\):

\[\lv A\x-\b \rv^2=\lv A\x-\p \rv^2+\lv e \rv^2.\]

This is the law \(c^2=a^2+b^2\) for a right triangle. The vector \(A\x-\p\) in the column space is perpendicular to \(\e\) in the left nullspace. We reduce \(A\x-\p\) to zero by choosing \(\x=\wh{\x}\). That leaves the smallest possible error \(\e=(e_1,e_2,e_3)\) which we can’t reduce.

The squared length of \(A\x-\b\) is minimized:

Note

The least squares solution \(\wh{\x}\) makes \(E=\lv A\x-\b \rv^2\) as small as possible.

The closest line misses by distance \(e_1,e_2,e_2=1,-2,1\). Those are vertical distances. The least squares line minimizes \(E=e_1^2+e_2^2+e_3^2\).

Notice that the erros \(1,-2,1\) add to zero. Reason: The error \(\e=(e_1,e_2,e_3)\) is perpendicular to the first column \((1,1,1)\) in \(A\). The dot product gives \(e_1+e_2+e_3=0\).

By calculus: Most functions are minimized by calculus! The graph bottoms out and the derivative in every direction is zero. Here the error function \(E\) to be minimized is a sum of squares \(e_1^2+e_2^2+e_3^2\) (the square of the error in each equation):

\[E=\lv a\x-\b \rv^2=(C+D\cd 0-6)^2+(C+D\cd 1)^2+(C+D\cd 2)^2.\]

With two unknonws \(C,D\), there are two derivatives–both zero at the minimum. They are “partial derivatives” because \(\partial{E}/\partial{C}\) treats \(D\) as constant and \(\partial{E}/\partial{D}\) treats \(C\) as constant:

\[ \begin{align}\begin{aligned}\partial{E}/\partial{C}=2(C+D\cd 0-6)+2(C+D\cd 1)+2(C+D\cd 2)=0\\\partial{E}/\partial{D}=2(C+D\cd 0-6)(0)+2(C+D\cd 1)(1)+2(C+D\cd 2)(2)=0\\3C+3D=6\\3C+5D=0\end{aligned}\end{align} \]

This matrix \(\bb 3&3\\3&5 \eb\) is \(A^TA\).

These equations are identical with \(A^TA\wh{\x}=A^T\b\). The best \(C\) and \(D\) are the components of \(\wh{\x}\). The equations from calculus are the same as the “normal equations” from linear algebra. These are the key equations of least squares:

Tip

The partial derivatives of \(\lv A\x-\b\rv^2\) are zero when \(A^TA\wh{\x}=A^T\b\).

The solution is \(C=5\) and \(D=-3\). Threfore \(b=5-3t\) is the best line. The errors are \(1,-2,1\) which are the same as components of vector \(\e\).

The Big Picture for Least Squares

Refer to the textbook Page 222.

Fitting a Straight Line

Fitting a line is the clearest application of least squares. It starts with \(m>2\) points, hopefully near a straigt line. At times \(t_1,\cds,t_m\) those \(m\) points are at heights \(b_1,\cds,b_m\). The best line \(C+Dt\) misses the points by vertical distances \(e_1,\cds,e_m\). No line is perfect, and the least squares line minimizes \(E=e_1^2+\cds+e_m^2\).

Now we allow \(m\) points (and \(m\) can be large).

A line goes through the \(m\) points when we exactly solve \(A\x=\b\). To fit the \(m\) points, we are trying to solve \(m\) equations (and we only jave two unknowns!).

\[\begin{split}A\x=\b \quad\rm{is}\quad \begin{matrix}C+Dt_1=b_1\\C+Dt_2=b_2\\\vdots\\C+Dt_m=b_m\end{matrix} \quad\rm{with}\quad A=\bb 1&t_1\\1&t_2\\\vdots&\vdots\\1&t_m\eb.\end{split}\]

When \(\b\) happens to lie in the column space, the points happen to lie on a line. In that case \(\b=\p\). Then \(A\x=\b\) is solvable and the errors are \(\e=(0,\cds,0)\).

Tip

The closest line \(C+Dt\) has heights \(p_1,\cds,p_m\) with errors \(e_1,\cds,e_m\). Solve \(A^TA\wh{\x}=A^T\b\) for \(\wh{\x}=(C,D)\). The errors are \(\e_i=\b_i-C-Dt_i\).

Fitting points by a straight line is so important that we give the two equations \(A^TA\wh{\x}=A^T\b\), once and for all. The two columns of \(A\) are independent (unless all times \(t_i\) are the same). So we turn to least squares and solve \(A^TA\wh{\x}=A^T\b\).

Dot-product matrix:

\[\begin{split}A^TA=\bb 1&\cds&1\\t_1&\cds&t_m \eb\bb 1&t_1\\\vds&\vds\\1&t_m \eb=\bb m&\sum t_i\\\sum t_i&\sum t_i^2\eb.\end{split}\]

On the right side of the normal equation is the 2 by 1 vector \(A^T\b\):

\[\begin{split}A^T\b=\bb 1&\cds&1\\t_1&\cds&t_m \eb\bb b_1\\\vds\\b_m \eb=\bb \sum b_i\\\sum t_ib_i \eb.\end{split}\]

In a specific problem, these numbers are given. The best \(\wh{\x}=(C,D)\) is \((A^TA)^{-1}A^T\b\).

Note

The line \(C+Dt\) minimizes \(e_1^2+\cds+e_m^2=\lv A\x-\b \rv^2\) when \(A^TA\wh{\x}=A^T\b\):

  • \(A^TA\wh{\x}=A^T\b\quad\bb m&\sum t_i\\\sum t_i&\sum t_i^2\eb\bb C\\D\eb=\bb \sum b_i\\\sum t_ib_i \eb\).

The vertical errors at the \(m\) points on the line are the components of \(\e=\b-\p\). This error vector (the residual) \(\b-A\wh{\x}\) is perpendicular to the columns of \(A\) (geometry). The error is in the nullspace of \(A^T\) (linear algebra). The best \(\wh{\x}=(C,D)\) minimizes the total error \(E\), the sum of squares (calculus):

\[E(\x)=\lv A\x-\b \rv^2=(C+Dt_1-b_1)^2+\cds+(C+Dt_m-b_m)^2.\]

Calculus set the derivatives \(\partial{E}/\partial{C}\) and \(\partial{E}/\partial{D}\) to zero, and produces \(A^TA\wh{\x}=A^T\b\).

Other least squares problems have more than two unknowns. Fitting by the best parabola has \(n=3\) coefficients \(C,D,E\). In general we are fitting \(m\) data points by \(n\) parameters \(x_1,\cds,x_n\). The matrix \(A\) has \(n\) columns and \(n<m\). The derivatives of \(\lv A\x=\b \rv^2\) give the \(n\) equations \(A^TA\wh{\x}=A^T\b\). The derivative of a square is linear. This is why the method of least squares is so popular.

\(A\) has orthogonal columns when the measurement times \(t_i\) add to zero. When the columns of \(A\) are orthogonal, \(A^TA\) will be a diagonal matrix. Orthogonal columns are so helpful that it is worth shifting the times by subtracting the average time \(\wh{t}=(t_1+\cds+t_m)/m\).

Dependent Columns in \(A\): What is \(\widehat{\boldsymbol{x}}\)?

In Section 7.4, the “pseudoinverse” of \(A\) will choose the shortest solution to \(A\wh{\x}=\p\).

Fitting by a Parabola

Problem Fit heights \(b_1,\cds,b_m\) at times \(t_1,\cds,t_m\) by a parabola \(C+Dt+Et^2\).

Solution With \(m>3\) points, the \(m\) equations for an exact fit are generally unsolvable:

\[\begin{split}\begin{matrix}C+Dt_1+Et_1^2=b_1\\\vds\\C+Dt_m+Et_m^2=b_m \end{matrix}\quad \begin{matrix}\rm{is\ }A\x=\b\rm{\ with}\\ \rm{the\ }m\rm{\ by\ }3\rm{\ matrix}\end{matrix}\quad A=\bb 1&t_1&t_1^2\\\vds&\vds&\vds\\1&t_m&t_m^2\eb.\end{split}\]

Least squares: The closest parabola \(C+Dt+Et^2\) choose \(\wh{\x}=(C,D,E)\) to satisfy the three normal equation \(A^TA\wh{\x}=A^T\b\).

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 4.4 Orthonormal Bases and Gram-Schmidt

The vectors \(\q_1,\cds,\q_n\) are orthogonal when their dot products \(\q_i\cd\q_j\) are zero. More exactly \(\q_i^T\q_j=0\) whenever \(i\neq j\). With one more step–just divide each vector by its length–the vectors become orthogonal unit vectors. Their lengths are all 1 (normal). Then the basis is called orthonormal.

Note

DEFINITION: The vectors \(\q_1,\cds,\q_n\) are orthonormal if:

  • \(\q_i^T\q_j=\left\{\begin{matrix}0\quad\rm{\ when\ } i \neq j\quad(\rm{orthogonal\ vectors})\ \ \ \quad\\1\quad\rm{\ when\ } i = j\quad(\rm{unit\ vectors}: \lv \q_i \rv=1)\end{matrix}\right.\).

A matrix with orthonormal columns is assigned the special letter \(Q\).

The matrix \(Q\) is easy to work with because \(Q^TQ=I\). \(Q\) is not required to be square.

Note

A matrix \(Q\) with orthonormal columns satisfies \(Q^TQ=I\):

  • \(Q^TQ=\bb -\q_1^T-\\-\q_2^T-\\\vds\\-\q_n^T- \eb\bb |&|&&|\\\q_1&\q_2&\cds&\q_n\\|&|&&| \eb=\bb 1&0&\cds&0\\0&1&\cds&0\\\vds&\vds&\ddots&\vds\\0&0&\cds&1 \eb=I\).

When row \(i\) of \(Q^T\) multiplies column \(j\) of \(Q\), the dot product is \(\q_i^T\q_j\). Off the diagonal (\(i\neq j\)) that dot product is zero by orthogonality. On the diagonal (\(i=j\)) the unit vectors give \(\q_i^T\q_i=\lv\q_i\rv^2=1\). Often \(Q\) is rectangular (math:m>n). Sometimes \(m=n\).

Tip

When \(Q\) is square, \(Q^TQ=I\) means that \(Q^T=Q\im\): transpose = inverse.

If the columns are only orthogonal (not unit vectors), dot products still give a diagonal matrix (not the identity matrix). This diagonal matrix is almost as good as \(I\).

To repeat: \(Q^TQ=I\) even when \(Q\) is rectangular. In that case \(Q^T\) is only an inverse from the left. For square matrices we also have \(QQ^T=I\), so \(Q^T\) is the two-sided inverse of \(Q\). The rows of a square \(Q\) are orthonormal like the columns. The inverse is the transpose. In this square case we call \(Q\) an orthogonal matrix.

Rotation: \(Q\) rotates every vector in the plane by the angle \(\theta\):

\[\begin{split}Q=\bb \cos\theta&-\sin\theta\\\sin\theta&\cos\theta \eb\quad\rm{and}\quad Q^T=Q\im =\bb \cos\theta&\sin\theta\\-\sin\theta&\cos\theta\eb.\end{split}\]

Those columns give an orthonormal basis for the plane \(\R^2\).

Permutation: These matrices change the order to \((y,z,x)\) and \((y,x)\):

\[\begin{split}\bb 0&1&0\\0&0&1\\1&0&0 \eb\bb x\\y\\z \eb=\bb y\\z\\x \eb\quad\rm{and}\quad \bb 0&1\\1&0 \eb\bb x\\y \eb=\bb y\\x \eb.\end{split}\]

The inverse of a permutation matrix is its transpose: \(Q\im=Q^T\):

\[\begin{split}\bb 0&0&1\\1&0&0\\0&1&0 \eb\bb y\\\\x \eb=\bb x\\y\\z \eb\quad\rm{and}\quad \bb 0&1\\1&0 \eb\bb y\\x \eb=\bb x\\y \eb.\end{split}\]

Every permutation matrix is an orthogonal matrix.

Reflection: If \(\u\) is any unit vector, set \(Q=I-2\u\u^T\). Notice that \(\u\u^T\) is a matrix while \(\u^T\u\) is the number \(\lv\u\rv^2=1\). Then \(Q^T\) and \(Q\im\) both equal \(Q\):

\[Q^T=I-2\u\u^T=Q \quad\rm{and}\quad Q^TQ=I-4\u\u^T+4\u\u^T\u\u^T=I.\]

Reflection matrices \(I-2\u\u^T\) are symmetric and also orthogonal. Notice \(\u^T\u=1\) inside \(4\u\u^T\u\u^T\).

Rotations preserve the length of every vector. So do reflections. So do permutations. So does multiplication by any orthogonal matrix \(Q\)lengths and angles don’t change.

Proof: \(\lv Q\x\rv^2\) equals \(\lv\x\rv^2\) because \((Q\x)^T(Q\x)=\x^TQ^TQ\x=\x^TI\x=\x^T\x\).

Note

If \(Q\) has orthonormal columns \((Q^TQ=I)\), it leaves lengths unchanged:

  • Same length for \(Q\x\): \(\lv Q\x\rv=\lv\x\rv\) for every vector \(\x\).

\(Q\) also preserves dot products: \((Q\x)^T(Q\y)=\x^TQ^TQ\y=\x^T\y\). Just use \(Q^TQ=I\).

Projections Using Orthonormal Bases: \(Q\) Replaces \(A\)

Suppose the basis vectors are actually orthonormal. The \(\a\)’s become the \(\q\)’s. Then \(A^TA\) simplifies to \(Q^TQ=I\).

Tip

The least squares solution of \(Q\x=\b\) is \(\wh{\x}=Q^T\b\). The projection matrix is \(QQ^T\).

There are no matrices to invert. The “coupling matrix” or “correlation matirx” \(A^TA\) is now \(Q^TQ=I\). There is no coupling. When \(A\) is \(Q\), with orthonormal columns, here is \(\p=Q\wh{\x}=QQ^T\b\):

Note

Projection onto \(\q\)‘s: \(\p=\bb |&&|\\\q_1&\cds&\q_n\\|&&| \eb\bb \q_1^T\b\\\vds\\\q_n^T\b \eb =\q_1(\q_1^T\b)+\cds+\q_n(\q_n^T\b)\).

Important case: When \(Q\) is square and \(m=n\), the subspace is the whole space. Then \(Q^T=Q\im\) and \(\wh{\x}=Q^T\b\) is the same as \(\x=Q\im\b\). The solution is exact. The projection of \(\b\) onto the whole space is \(\b\) itself. In this case \(\p=\b\) and \(P=QQ^T=I\).

When \(\p=\b\), our formula assembles \(\b\) out of its 1-dimensional projections. If \(\q_1,\cds,\q_n\) is an orthonormal basis for the whole space, then \(Q\) is square. Every \(\b=QQ^T\b\) is the sum of its components along the \(\q\)’s:

Note

\(\b=\q_1(\q_1^T\b)+\q_2(\q_2^T\b)+\cds+\q_n(\q_n^T\b)\).

Transforms: \(QQ^T=I\) is the foundation of Fourier series and all the great “transforms” of applied mathematics. They break vectors \(\b\) or functions \(f(x)\) into perpendicular pieces. By adding the pieces, the inverse transform puts \(\b\) and \(f(x)\) back together.

The Gram-Schmidt Process

Start with three independent vectors \(\a,\b,\C\). We intend to construct three orthogonal vectors \(\A,\B,\C\). Then (at the end may be easiest) we divide \(\A,\B,\C\) by their lengths. That produces three orthonormal vectors \(\q_1=\A/\lv\A\rv,\q_2=\B/\lv\B\rv,\q_3=\C/\lv\C\rv\).

Gram-Schmidt: Begin by choosing \(\A=\a\). This first direction is accepted as it comes. The next direction \(\B\) must be perpendicular to \(\A\). Start with \(\b\) and subtract its projection along \(\A\). This leaves the perpendicular part, which is the orthogonal vector \(\B\):

Note

First Gram-Schmidt step: \(\dp \B=\b-\frac{\A^T\b}{\A^T\A}\A\).

Multiply the equation by \(\A^T\) to verify that \(\A^T\B=\A^T\b-\A^T\b=0\). This vector \(\B\) is what we have called the error vector \(\e\), perpendicular to \(\A\). Notice that \(\B\) is not zero (otherwise \(\a\) and \(\b\) would be dependent). The directions \(\A\) and \(\B\) are now set.

The third direction starts with \(\C\). This is not a combination of \(\A\) and \(\B\) (because \(\C\) is not a combination of \(\a\) and \(b\)). But most likely \(\C\) is not perpendicular to \(\A\) and \(\B\). So subtract off its components in those two directions to get a perpendicular direction \(\C\):

Note

Next Gram-Schmidt step: \(\dp \C=\C-\frac{\A^T\C}{\A^T\A}\A-\frac{\B^T\C}{\B^T\B}\B\).

This is the one and only idea of the Gram-Schmidt process. Subtract from every new vector its projections in the directions already set. That idea is repeated at every step. If we had a fourth vector \(\bs{d}\), we would subtract three projections onto \(\A,\B,\C\) to get \(\bs{D}\).

At the end, or immediately when each one is found, divide the orthogonal vectors \(\A,\B,\C,\bs{D}\) by their lengths. The resulting vectors \(\q_1,\q_2,\q_3,\q_4\) are orthonormal.

The Factorization \(A=QR\)

We started with a matrix \(A\), whose columns were \(\a,\b,\C\). We ended with a matrix \(Q\), whose columns are \(\q_1,\q_2,\q_3\). Since the vectors \(\a,\b,\C\) are combinations of the \(\q\)’s (and vice versa), there must be a third matrix connecting \(A\) to \(Q\). This third matrix is the triangular \(R\) in \(A=QR\).

  • The vectors \(\a\) and \(\A\) and \(\q_1\) are all along a single line.

  • The vectors \(\a,\b\) and \(\A,\B\) amd \(\q_1,\q_2\) are all in the smae plane.

  • The vectors \(\a,\b,\C\) and \(\A,\B,\C\) and \(\q_1,\q_2,\q_3\) are in one subspace.

At every step \(\a_1,\cds,\a_k\) are combinations of \(\q_1,\cds,\q_k\). Later \(\q\)’s are not involved. The connecting matrix \(R\) is triangular, and we have \(A=QR\):

Note

\(\bb \\\a&\b&\C\\\ \eb=\bb \\\q_1&\q_2&\q_3\\\ \eb \bb\q_1^T\a&\q_1^T\b&\q_1^T\C\\&\q_2^T\b&\q_2^T\C\\&&\q_3^T\C\eb\) or \(A=QR\).

\(A=QR\) is Gram-Schmidt in a nutshell. Multiply by \(Q^T\) to recognize \(R=Q^TA\) above.

Note

Gram-Schmidt: From independent vectors \(\a_1,\cds,\a_n\), Gram-Schmidt constructs orthonormal vectors \(\q_1,\cds,\q_n\). The matrices with these columns satisfy \(A=QR\). Then \(R=Q^TA\) is upper triangular because later \(\q\)’s are orthogonal to earlier \(\a\)’s.

Any \(m\) by \(n\) matrix \(A\) with independent columns can be factored into \(A=QR\). The \(m\) by \(n\) matrix \(Q\) has orthonormal columns, and the square matrix \(R\) is upper triangular with positive diagonal. We must not forget why this is usefull for least squares: \(\bs{A^TA=(QR)^TQR=R^TQ^TQR=R^TR}\). The least squares equation \(A^TA\wh{\x}=A^T\b\) simplifies to \(R^TR\wh{\x}=R^TQ^T\b\). Then finally we reach \(R\wh{\x}=Q^T\b\).

Note

Least squares: \(R^TR\wh{\x}=R^TQ^T\b\) or \(R\wh{\x}=Q^T\b\) or \(\wh{\x}R\im Q^T\b\).

Instead of solving \(A\x=\b\), which is impossible, we solve \(R\wh{\x}=Q^T\b\) by back substitution–which is very fast. The real cost is the \(mn^2\) multiplications in the Gram-Schmidt process, which are needed to construct the orthogonal \(Q\) and the triangular \(R\) with \(A=QR\).

Below is an informal code. The last line of that code normalizes \(\v\) (divides by \(r_{jj}=\lv\lv\rv\)) to get the unit vector \(\q_j\):

\[r_{kj}=\sum_{i=1}^{m}q_{ik}v_{ij}\rm{\ and\ }v_{ij}=v_{ij}-q_{ik}r_{kj} \rm{\ and\ }r_{jj}=\left(\sum_{i=1}^{m}v_{ij}^2\right)^{1/2}\rm{\ and\ } q_{ij}=\frac{v_{ij}}{r_{jj}}.\]

Starting from \(\a,\b,\C=\a_1,\a_2,\a_3\) this code will construct \(\q_1\), then \(\B,\q_2\), then \(\C,\q_3\):

\[ \begin{align}\begin{aligned}\q_1=\a_1/\lv\a_1\rv\quad\B=\a_2-(\q_1^T\a_2)\q_1\quad\q_2=\B/\lv\B\rv\\\C^*=\a_3-(\q_1^T\a_3)\q_1\quad\C=\C^*-(\q_2^T\C^*)\q_2\quad\q_3=\C/\lv\C\rv\end{aligned}\end{align} \]
for j = 1:n
    V = A(:,j);
    for i = 1:j-1
        R(i,j) = Q(:,i)' * v;
        v = v - R(i,j) * Q(:, i);
    end
    R(j,j) = norm(v);
    Q(:,j) = v/R(j,j);
end

To recover column \(j\) of \(A\), undo the last step and the middle steps of the code:

\[R(j,i)\q_j=(\v\rm{\ minus\ its\ projections})=(\rm{column\ }j\rm{\ of\ }A)-\sum_{i=1}^{j-1}R(i,j)\q_i.\]
\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 5 Determinants

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 5.1 The Properties of Determinants

The determinant of a square matrix is a single number. The determinant is zero when the matrix has no inverse. When \(A\) is invertible, the determinant of \(A\im\) is \(1/(\det A)\). In fact the determinant leads to a formula for every entry in \(A\im\).

This is one use for determinants–to find formulas for inverse matrices and pivots and solutions \(A\im\b\). For a large matrix we seldom use those formulas, because elimination is faster.

\[\begin{split}A=\bb a&b\\c&d \eb\rm{\ has\ inverse\ }A\im=\frac{1}{ad-bc}\bb d&-b\\-c&a \eb.\end{split}\]

When the determinant is \(ad-bc=0\), we are asked to divide by zero and we can’t–then \(A\) has no inverse. Dependent rows always lead to \(\det A=0\).

The determinant is also connected to the pivots. For a 2 by 2 matrix the pivots are \(a\) and \(d-(c/a)b\). The product of the pivots is the determinant:

Product of pivots: \(\dp a(d-\frac{c}{a}b)=ad-bc\) which is \(\det A\).

Aftere a row exchange the pivots change to \(c\) and \(b-(a/c)d\). Those new pivots multiply to give \(bc-ad\). The row exchange to \(\bb c&d\\a&b \eb\) reversed the sign of the determinant.

The determinant of an \(n\) by \(n\) matrix can be found in three ways:

  1. Pivot formula: Multiply the \(n\) pivots (times 1 or -1).

  2. “Big” formula: Add up \(n!\) terms (times 1 or -1).

  3. Cofactor formula: Combine \(n\) smaller determinants (times 1 or -1).

The determinant changes sign when two rows (or two columns) are exchanged.

The identity matrix has determinant \(+1\). Exchange two rows and \(\det P=-1\). Exchange two more rows and the new permutation has \(\det P=+1\). Half of all permutations are even (\(\det P=1\)) and half are odd (\(\det P=-1\)). Starting from \(I\), half of the \(P\)’s involve an even number of exchanges and half require an odd number. In the 2 by 2 case, \(ad\) has a plus sign and \(bc\) has minus–coming from the row exchange:

\[\begin{split}\det \bb 1&0\\0&1 \eb = 1 \rm{\ and\ } \det \bb 0&1\\1&0 \eb = -1\end{split}\]

The Properties of the Determinant

The determinant is written in two ways, \(\det A\) and \(\bv A \ev\). Notice: Brackets for the matrix, straight bars for its determinant.

\[\begin{split}\rm{The\ determinant\ of\ } \bb a&b\\c&d \eb \rm{\ is\ } \bv a&b\\c&d \ev = ad - bc.\end{split}\]

We will check all rules with the 2 by 2 formula, but do not forget: The rules apply to any \(n\) by \(n\) matrix \(A\).

1. The determinant of the \(n\) by \(n\) identity matrix is 1.

\[\begin{split}\bv 1&0\\0&1 \ev=1 \quad\rm{and}\quad \bv 1\\&\ddots\\&&1 \ev=1.\end{split}\]

2. The determinant changes sign when two rows are exchanged (sign reversal).

\[\begin{split}\bv c&d\\a&b \ev=-\bv a&b\\c&d \ev.\end{split}\]

We can find \(\det P\) for any permutation matrix. Just exchange rows of \(I\) until you reach \(P\). Then \(\det P=+1\) for an even number of row exchanges and \(\det P=-1\) for an odd number.

3. The determinant is a linear function of each row separately (all other rows stay fixed).

If the first row is multiplied by \(t\), the determinant is multiplied by \(t\). If first rows are added, determinants are added. This rule only applies when the other rows do not change! Notice how \(c\) and \(d\) stay the same:

Note

Multiply row 1 by any number \(t\), \(\det\) is multiplied by \(t\):

  • \(\bv ta&tb\\c&d \ev=t \bv a&b\\c&d \ev\).

Add row 1 of \(A\) to row 1 of \(A^{\pr}\), then determinants add:

  • \(\bv a+a^{\pr}&b+b^{\pr}\\c&d \ev=\bv a&b\\c&d \ev+\bv a^{\pr}&b^{\pr}\\c&d \ev\).

In the first case, both sides are \(tad-tbc\). Then \(t\) factors out. In the second case, both sides are \(ad+a^{\pr}d-bc-b^{\pr}c\). These rules still apply when \(A\) is \(n\) by \(n\), and one row changes.

Combining multiplication and addition, we get any linear combination in one row. Row 2 for row exchanges can put that row into the first row and back again.

This rule does not mean that \(\det 2I=2\det I\). To obtain \(2I\) we have to multiply both rows by 2, and the factor 2 comes out both times:

\[\begin{split}\bv 2&0\\0&2 \ev=2^2=4\quad\rm{and}\quad\bv t&0\\0&t \ev=t^2.\end{split}\]

4. If two rows of \(A\) are equal, then \(\det A=0\).

Equal rows:

\[\begin{split}\bv a&b\\a&b \ev=0.\end{split}\]

Rule 4 follows from rule 2. Exchange the two equal rows. The determinant \(D\) is supposed to change sign. But also \(D\) has to stay the same, because the matrix is not changed. The only number with \(-D=D\) is \(D=0\)–this must be the determinant.

A matrix with two equal rows has no inverse. Rule 4 makes \(\det A=0\). But matrices can be singular and determinants can be zero without having equal rows!

5. Subtracting a multiple of one row from another row leaves \(\det A\) unchanged.

\(l\) times row 1 from row 2:

\[\begin{split}\bv a&b\\c-la&d-lb \ev=\bv a&b\\c&d \ev\end{split}\]

Rule 3 (linearity) splits the left side into the right side plus another term \(-l\bv a&b\\c&d \ev\). This extra term is zero by rule 4: equal rows.

Conclusion: The determinant is not changed by the usual elimination steps from \(A\) to \(U\). Thus \(\det A=\det U\). If we can find determinants of triangular matrices \(U\), we can find determinants of all matrices \(A\). Every row exchange reverses the sign, so always \(\det A=\pm \det U\).

6. A matrix with a row of zeros has \(\det A=0\).

Row of zeros:

\[\begin{split}\bv 0&0\\c&d \ev=0\quad\rm{and}\quad\bv a&b\\0&0 \ev=0.\end{split}\]

7. If \(A\) is triangular then \(\det A=\a_{11}\a_{22}\cds\a_{nn}=\) product of diagonal entries.

Triangular:

\[\begin{split}\bv a&b\\0&d \ev=ad\quad\rm{and\ also}\quad\bv a&0\\c&d \ev=ad.\end{split}\]

Suppose all diagonal entries are nonzero. Remove the off-diagonal entries by elimination! By rule 5 the determinant is not changed–and now the matrix is diagonal:

Diagonal matrix:

\[\begin{split}\det\bb a_{11}&&&0\\&a_{22}\\&&\ddots\\0&&&a_{nn} \eb=(a_{11})(a_{22})\cds(a_{nn}).\end{split}\]

8. If \(A\) is singular then \(\det A=0\). If \(A\) is invertible then \(\det A\neq 0\).

Singular:

\[\begin{split}\bb a&b\\c&d \eb \rm{\ is\ singular\ if\ and\ only\ if\ } ad-bc=0.\end{split}\]

Proof: Elimination goes from \(A\) to \(U\). If \(A\) is singular then \(U\) has a zero row. The rules give \(\det A=\det U=0\). If \(A\) is invertible then \(U\) has the pivots along its diagonal. The product of nonzero pivots (using rule 7) gives a nonzero determinant:

Note

Multiply pivots \(\det A=\pm \det U=\pm(\rm{product\ of\ the\ pivots})\).

The pivots of a 2 by 2 matrix (if \(a\neq 0\)) are \(a\) and \(d-(c/a)b\):

\[\begin{split}\bv a&b\\c&d \ev=\bv a&b\\0&d-(c/a)b \ev=ad-bc.\end{split}\]

This is the first formula for the determinant. The sign in \(\pm\det U\) depends on whether the number of row exchanges is even or odd: \(+1\) or \(-1\) is the determinant of the permutation \(P\) that exchanges rows.

With no row exchanges, \(P=I\) and \(\det A=\det U=\) product of pivots. And \(\det L=1\):

\[\rm{If\ }PA=LU\rm{\ then\ }\det P=\det A=\det L\det U\rm{\ and\ }\det A=\pm\det U.\]

9. The determinant of \(AB\) is \(\det A\) times \(\det B\): \(\bv AB\ev=\bv A\ev\bv B\ev\).

Product rule:

\[\begin{split}\bv a&b\\c&d \ev\bv p&q\\r&s \ev=\bv ap+br&aq+bs\\cp+dr&cq+ds \ev.\end{split}\]

When the matrix is \(A\im\), this rule says that the determinant of \(A\im\) is \(1/\det A\):

\(A\) times \(A\im\):

\[AA\im=I\quad\rm{so}\quad(\det A)(\det A\im)=\det I=1.\]

For the \(n\) by \(n\) case, here is a snappy proof that \(\bv AB\ev=\bv A\ev\bv B\ev\). When \(\bv B \ev\) is not zero, consider the ratio \(D(A)=\bv AB \ev/\bv B \ev\). Check that this ratio \(D(A)\) has the following properties 1,2,3. Then \(D(A)\) has to be the determinant and we have \(\bv AB \ev/\bv B \ev=\bv A \ev\).

Property 1 (Determinant of \(I\)): If \(A=I\) then the ratio \(D(A)\) becomes \(\bv B \ev/\bv B \ev=1\)

Property 2 (Sign reversal): When two rows of \(A\) are exchanged, so are the same two rows of \(AB\). Therefore \(\bv AB \ev\) changes sign and so does the ratio \(\bv AB \ev/\bv B \ev\).

Property 3 (Linearity): When row 1 of \(A\) is multiplied by \(t\), so is row 1 of \(AB\). This multiplies the determinant \(\bv AB \ev\) by \(t\). So the ratio \(\bv AB \ev/\bv B \ev\) is multiplied by \(t\).

Add row 1 of \(A\) to row 1 of \(A^{\pr}\). Then row 1 of \(AB\) adds to row 1 of \(A^{\pr}B\). By rule 3, determinants add. After dividing by \(\bv B \ev\), the ratios add–as desired.

Conclusion: This ratio \(\bv AB \ev/\bv B \ev\) has the same three properties that define \(\bv A \ev\). Therefore it equals \(\bv A \ev\). This proves the product rule \(\bv AB\ev=\bv A\ev\bv B\ev\). The case \(\bv B \ev=0\) is separate and easy, because \(AB\) is singular when \(B\) is singular. Then \(\bv AB\ev=\bv A\ev\bv B\ev\) is \(0=0\).

10. The transpose \(A^T\) has the same determinant as \(A\).

Transpose:

\[\begin{split}\bv a&b\\c&d \ev=\bv a&c\\b&d \ev\rm{\ since\ both\ sides\ equal\ }ad-bc.\end{split}\]

The equation \(\bv A^T \ev=\bv A \ev\) becomes \(0=0\) when \(A\) is singular (we know that \(A^T\) is also singular). Otherwise \(A\) has the usual factorization \(PA=LU\). Transposing both sides gives \(A^TP^T=U^TL^T\). The proof of \(\bv A \ev=\bv A^T \ev\) comes by using rule 9 for products:

\[\rm{Compare\ }\det P\det A=\det L\det U\rm{\ with\ }\det A^T\det P^T=\det U^T\det L^T.\]

First, \(\det L=\det L^T=1\) (both have 1’s on the diagonal). Second, \(\det U=\det U^T\) (those triangular matrices have the same diagonal). Third, \(\det P=\det P^T\) (permutations have \(P^TP=I\), so \(\bv P^T \ev\bv P \ev =1\) by rule 9; thus \(\bv P \ev\) and \(\bv P^T \ev\) both equal 1 or both equal -1). So \(L,U,P\) have the same determinants as \(L^T,U^T,P^T\) and this leaves \(\det A=\det A^T\).

Important comment on columns: Every rule for the rows can apply to the columns (just by transposing, since \(\bv A \ev=\bv A^T \ev\)). That determinant changes sign when two columns are exchanged. A zero column or two equal columns will make the determinant zero. If a column is multiplied by \(t\), so is the determinant. The determinant is a linear function of each column separately.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 5.2 Permutations and Cofactors

\[\begin{split}A=\bb 2&-1&0&0\\-1&2&-1&0\\0&-1&2&-1\\0&0&-1&2 \eb\quad\rm{has}\quad\det A=5.\end{split}\]

We can find this determinant in all three ways: pivots, big formula, cofactors.

  1. The product of the pivots is \(2\cd\frac{3}{2}\cd\frac{4}{3}\cd\frac{5}{4}\). Cancellation produces 5.

  2. The “big formula” has \(4!=24\) terms. Only five terms are nonzero:

\[\det A=16-4-4-4+1=5.\]
  1. The numbers \(2,-1,0,0\) in the first row multiply their cofactors \(4,3,2,1\) from the other rows. That gives \(2\cd 4-1\cd 3=5\). Those cofactors are 3 by 3 determinants. Cofactors use the rows and columns that are not used by the entry in the first row.

Tip

Every term in a determinant uses each row and column once!

The Pivot Formula

When elimination leads to \(A=LU\), the pivots \(d_1,\cds,d_n\) are on the diagonal of the upper triangular \(U\). If no row exchanges are involved, multiply those pivots to find the determinant:

\[\det A=(\det L)(\det U)=(1)(d_1 d_2\cds d_n).\]

Note

\((\det P)(\det A)=(\det L)(\det U)\) gives \(\det A=\pm(d_1 d_2 \cds d_n)\).

The first pivots of this tridiagonal matrix \(A\) are \(2,\frac{3}{2},\frac{4}{3}\). The next are \(\frac{5}{4}\) and \(\frac{6}{5}\) and eventually \(\frac{n+1}{n}\). Factoring this \(n\) by \(n\) matrix reveals its determinant:

\[\begin{split}\bb 2&-1\\-1&2&-1\\&-1&2&\cd\\&&\cd&\cd&-1\\&&&-1&2 \eb= \bb 1\\-\frac{1}{2}&1\\&-\frac{2}{3}&1\\&&\cd&\cd\\&&&-\frac{n-1}{n}&1 \eb \bb 2&-1\\&\frac{3}{2}&-1\\&&\frac{4}{3}&-1\\&&&\cd&\cd\\&&&&\frac{n+1}{n} \eb.\end{split}\]

The pivots are on the diagonal of \(U\) (the last matrix). The \(n\) by \(n\) determinant is \(n+1\):

-1, 2, -1 matrix:

\[\det A=(2)\left(\frac{3}{2}\right)\left(\frac{4}{3}\right)\cds\left(\frac{n+1}{n}\right)=n+1.\]

Important point: The first pivots depend only on the upper left corner of the original matrix \(A\). This is a rule for all matrices without row exchanges.

The first \(k\) pivots come from the \(k\) by \(k\) matrix \(A_k\) in the top left corner of \(A\). The determinant of that corner submatrix \(A_k\) is \(d_1d_2\cds d_k\) (first \(k\) pivots).

Elimination deals with the matrix \(A_k\) in the upper left corner while starting on the whole matrix. We assume no row exchanges–then \(A=LU\) and \(A_k=L_kU_k\). Dividing one determinant by the previous determinant (\(\det A_k\) divided by \(\det A_{k-1}\)) cancels everything but the latest pivot \(d_k\). Each pivot is a ratio of determinants:

Note

Pivots from determinants: The \(k\)th pivot is \(\dp d_k=\frac{d_1d_2\cds d_k}{d_1d_2\cds d_{k-1}}=\frac{\det A_k}{\det A_{k-1}}\)

We don’t need row exchanges when all the upper left submatrices have \(\det A_k\neq 0\).

The Big Formula for Determinants

The formula has ** :math:`n!` **terms. Half the terms have minus signs (as in \(-bc\)). The other half have plus signs (as in \(ad\)). For \(n=3\) there are \(3!=(3)(2)(1)=6\) terms:

Note

3 by 3 determinant: \(\bv a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23} \\a_{31}&a_{32}&a_{33} \ev= \begin{matrix}+a_{11}a_{22}a_{33}+a_{12}a_{23}a_{31}+a_{13}a_{21}a_{32}\\ -a_{11}a_{23}a_{32}-a_{12}a_{21}a_{33}-a_{13}a_{22}a_{31}\end{matrix}\).

Notice the pattern. Each product like \(a_{11}a_{23}a_{32}\) has one entry from each row. It also has one entry from each column. The column order 1, 3, 2 means that this particular term comes with a minus sign. The column 3, 1, 2 in \(a_{13}a_{21}a_{32}\) has a plus sign. It will be “permutations” that tell us the sign.

To derive the big formula I start with \(n=2\). The goal is to reach \(ad-bc\) in a systematic way. Break each row into two simpler rows:

\[\bb a&b \eb=\bb a&0 \eb+\bb 0&b \eb\quad\rm{and}\quad\bb c&d \eb=\bb c&0 \eb+\bb 0&d \eb.\]

Now apply linearity, first in row 1 (with row 2 fixed) and then in row 2 (with row 1 fixed):

\[\begin{split}\bv a&b\\c&d \ev=\bv a&0\\c&d \ev+\bv 0&b\\c&d \ev=\bv a&0\\c&0 \ev+\bv a&0\\0&d \ev+\bv 0&b\\c&0 \ev+\bv 0&b\\0&d \ev.\end{split}\]

The last line has \(2^2=4\) determinants. The first and fourth are zero because one row is a multiple of the other row. We are left with \(2!=2\) determinants to compute:

\[\begin{split}\bv a&0\\c&d \ev+\bv 0&b\\c&d \ev=ad\bv 1&0\\0&1 \ev+bc\bv 0&1\\1&0 \ev=ad-bc.\end{split}\]

The splitting led to permutation matrices. Their determinants give a plus or minus sign. The permutation tells the column sequence. In this case the column order is (1,2) or (2,1).

Now try \(n=3\). We pay attention only when the entries \(a_{ij}\) come rom different columns, like \((3,1,2)\):

Note

\(\bv a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33} \ev=\begin{matrix}\bv a_{11}\\&a_{22}\\&&a_{33} \ev+\bv &a_{12}\\&&a_{23} \\a_{31} \ev+\bv &&a_{13}\\a_{21}\\&a_{32} \ev \\ +\bv a_{11}\\&&a_{23}\\&a_{32} \ev+\bv &a_{12}\\a_{21}\\&&a_{33} \ev+\bv &&a_{13}\\&a_{22}\\a_{31} \ev\end{matrix}\).

There are \(3!=6\) ways to order the columns, so six determinants. The six permutations of \((1,2,3)\) include the identity permutation \((1,2,3)\) from \(P=I\).

\[\bs{\rm{Column\ numbers}} = (1,2,3),(2,3,1),(3,1,2),(1,3,2),(2,1,3),(3,2,1).\]

The last three are odd permutations (one exchange). The first three are even permutations (0 or 2 exchanges). Factor out the \(a_{ij}\):

\[ \begin{align}\begin{aligned}\begin{split}\det A=a_{11}a_{22}a_{33}\bv 1\\&1\\&&1 \ev+a_{12}a_{23}a_{31}\bv &1\\&&1\\1 \ev+a_{13}a_{21}a_{32}\bv &&1\\1\\&1 \ev\end{split}\\\begin{split}+a_{11}a_{23}a_{32}\bv 1\\&&1\\&1 \ev+a_{12}a_{21}a_{33}\bv &1\\1\\&&1 \ev+ a_{13}a_{22}a_{31}\bv &&1\\&1\\1 \ev.\end{split}\end{aligned}\end{align} \]

The first three (even) permutations have \(\det P=+1\), the last three (odd) permutations have \(\det P=-1\).

Now you can see the \(n\) by \(n\) formula. There are \(n!\) orderings of the columns. The columns \((1,2,\cds,n)\) go in each possible order \((\alpha,\beta,\cds,\omega)\). Taking \(a_{1\alpha}\) from row 1 and \(a_{2\beta}\) from row 2 and eventually \(a_{n\omega}\) from row \(n\), the determinant contains the product \(a_{1\alpha}a_{2\beta}\cds a_{n\omega}\) times \(+1\) or \(-1\).

The determinant of \(A\) is the sum of these \(n!\) simple determinants, times 1 or -1. The simple determinants \(a_{1\alpha}a_{2\beta}\cds a_{n\omega}\) choose one entry from every row and column.

Note

BIG FORMULA: \(\det A=\) sum over all \(n!\) column permutations \(P=(\alpha,\beta,\cds,\omega)=\sum(\det P)a_{1\alpha}a_{2\beta}\cds a_{n\omega}\).

Determinant by Cofactors

When you separate out the factor \(a_{11}\) or \(a_{12}\) or \(a_{1\alpha}\) that comes from the first row, you see linearity. For 3 by 3, separate the usual 6 terms of the determinant into 3 pairs:

Note

\(\det A=a_{11}(a_{22}a_{33}-a_{23}a_{32})+a_{12}(a_{23}a_{31}-a_{21}a_{33})+a_{13}(a_{21}a_{32}-a_{22}a_{31})\).

Those three quantities in parentheses are called “cofactors”. They are 2 by 2 determinants, from row 2 and 3. The first row contributes the factors \(a_{11},a_{12},a_{13}\). The lower rows contribute the cofactors \(C_{11},C_{12},C_{13}\). Certainly the determinant \(a_{11}C_{11}+a_{12}C_{12}+a_{13}C_{13}\) depends linearly on \(a_{11},a_{12},a_{13}\).

The cofactor of \(a_{11}\) is \(C_{11}=a_{22}a_{33}-a_{23}a_{32}\). You can see it in this splitting:

\[\begin{split}\bv a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33} \ev= \bv a_{11}\\&a_{22}&a_{23}\\&a_{32}&a_{33} \ev+ \bv &a_{12}&\\a_{21}&&a_{23}\\a_{31}&&a_{33} \ev+ \bv &&a_{13}\\a_{21}&a_{22}&\\a_{31}&a_{32}& \ev.\end{split}\]

We are still choosing one entry from each row and column. Since \(a_{11}\) uses up row 1 and column 1, that leaves a 2 by 2 determinant as its cofactor.

The sign pattern for cofactors along the first row is plus-minus-plus-minus. You cross out row 1 and column \(j\) to get a submatrix \(M_{1j}\) of size \(n-1\). Multiply its determinant by the sign \((-1)^{1+j}\) to get the cofactor:

The cofactors along row 1 are \(C_{1j}=(-1)^{1+j}\det M_{1j}\).

The cofactor expansion is \(\det A=a_{11}C_{11}+a_{12}+C_{12}+\cds+a_{1n}C_{1n}\).

Note: What ever is possible for row 1 is possible for row \(i\). The entries \(a_{ij}\) in that row also have cofactors \(C_{ij}\). Those are determinants of order \(n-1\), multiplied by \((-1)^{i+j}\). Since \(a_{ij}\) accounts for row \(i\) and column \(j\), the submatrix \(M_{ij}\) throws out row \(i\) and column \(j\). The sign matrix shows the \(\pm\) pattern:

\[\begin{split}\bb +&-&+&-\\-&+&-&+\\+&-&+&-\\-&+&-&+ \eb.\end{split}\]

Note

The determinant is the dot product of any row \(i\) of \(A\) with its cofactors using other rows:

  • COFACTOR FORMULA: \(\det A=a_{i1}C_{i1}+a_{i2}C_{i2}+\cds+a_{in}C_{in}\).

Each cofactor \(C_{ij}\) (order \(n-1\), without row \(i\) and column \(j\)) includes its correct sign:

  • Cofactor: \(C_{ij}=(-1)^{i+j}\det M_{ij}\).

A determinant of order \(n\) is a combination of determinants of order \(n-1\). Each subdeterminant breaks into determinants of order \(n-2\). We could define all determinants via equation. This rule goes from order \(n\) to \(n-1\) to \(n-2\) and eventually to order 1. Define the 1 by 1 determinant \(\bv a \ev\) to be the number \(a\). Then the cofactor method is complete.

We preferred to construct \(\det A\) from its properties (linearity, sign reversal \(\det I=1\)). One last formula comes from the rule that \(\det A=\det A^T\). We can expand in cofactors, down a column instead of across a row. Down column \(j\) the entries are \(a_{1j}\) to \(a_{nj}\). The cofactors are \(C_{1j}\) to \(C_{nj}\). The determinant is the dot product:

Cofactors down column \(j\): \(\det A=a_{1j}C_{1j}+a_{2j}C_{2j}+\cds+a_{nj}C_{nj}\).

Cofactors are useful when matrices have many zeros.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 5.3 Cramer’s Rule, Inverses, and Volumes

Cramer’s Rule solves \(A\x=\b\). A neat idea gives the first component \(x_1\). Replacing the first column of \(I\) by \(\x\) gives a matrix with determinant \(x_1\). When you multiply it by \(A\), the first column becomes \(A\x\) which is \(\b\). The other columns of \(B_1\) are copied from \(A\):

Key idea:

\[\begin{split}\bb \\\ &A& \\\ \eb\bb x_1&0&0\\x_2&1&0\\x_3&0&1 \eb= \bb b_1&a_{12}&a_{13}\\b_2&a_{22}&a_{23}\\b_3&a_{32}&a_{33} \eb=B_1.\end{split}\]

We multiplied a column at a time. Take determinants of the three matrices to find \(x_1\):

Product rule:

\[(\det A)(x_1)=\det B_q1\quad\rm{or}\quad x_1=\frac{\det B_1}{\det A}.\]

This is the first component of \(\x\) in Cramer’s Rule! Changing a column of \(A\) gave \(B_1\). To find \(x_2\) and \(B_2\), put the vectors \(\x\) and \(\b\) into the second column of \(I\) and \(A\):

Same idea:

\[\begin{split}\bb \\\ \a_1&\a_2&\a_3\\\ \eb\bb 1&x_1&0\\0&x_2&0\\0&x_3&1 \eb=\bb \\\ \a_1&\b&\a_3\\\ \eb=B_2.\end{split}\]

Take determinants to find \((\det A)(\x_2)=\det B_2\). This gives \(x_2=(\det B_2)/(\det A)\).

Note

CRAMER’s RULE: If \(\det A\) is not zero, \(A\x=\b\) is solved by determinants:

  • \(\dp x_1=\frac{\det B_1}{\det A}\quad x_2=\frac{\det B_2}{\det A}\quad\cds\quad x_n=\frac{\det B_n}{\det A}\).

The matrix \(B_j\) has the \(j\)th column of \(A\) replaced by the vector \(\b\).

To solve an \(n\) by \(n\) system, Cramer’s Rule evaluates \(n+1\) determinants (of \(A\) and the \(n\) different \(B\)’s).

\(A\im\) involves the cofactors. When the right side is a column of the identity matrix \(I\), as in \(AA\im=I\), the determinant of each \(B_j\) in Cramer’s Rule is a cofactor of \(A\).

You can see those cofactors for \(n=3\). Solve \(A\x=(1,0,0)\) to find column 1 of \(A\im\):

Determinant of \(B\)‘s = Cofactors of \(A\):

\[\begin{split}\bv 1&a_{12}&a_{13}\\0&a_{22}&a_{23}\\0&a_{32}&a_{33} \ev\quad \bv a_{11}&1&a_{13}\\a_{21}&0&a_{23}\\a_{31}&0&a_{33} \ev\quad \bv a_{11}&a_{12}&1\\a_{21}&a_{22}&0\\a_{31}&a_{32}&0 \ev\end{split}\]

The first determinant \(\bv B_1 \ev\) is the cofactor \(C_{11}=a_{22}a_{33}-a_{23}a_{32}\). Then \(\bv B_2 \ev\) is the cofactor \(C_{12}\). Notice that the correct minus sign appears in \(-(a_{21}a_{33}-a_{23}a_{31})\). This cofactor \(C_{12}\) goes into column 1 of \(A\im\). When we divide by \(\det A\), we have the inverse matrix.

Note

The \(i,j\) entry of \(A\im\) is the cofactor \(C_{ji}\) (not \(C_{ij}\)) divided by \(\det A\).

  • FORMULA FOR \(A\im\): \(\dp(A\im)_{ij}=\frac{C_{ji}}{\det A}\quad\rm{and}\quad A\im=\frac{C^T}{\det A}\).

The cofactors \(C_{ij}\) go into the “cofactor matrix\(C\). The transpose of \(C\) leads to \(A\im\). To compute the \(i,j\) entry of \(A\im\), cross out row \(j\) and column \(i\) of \(A\). Multiply the determinant by \((-1)^{i+j}\) to get the cofactor \(C_{ji}\), and divide by \(\det A\).

Summary: In solving \(AA\im=I\), each column of \(I\) leads to a column of \(A\im\). Every entry of \(A\im\) is a ratio: determinant of size \(n-1/\) determinant of size \(n\).

Direct proof of the formula \(A\im=C^T/\det A\): This means \(AC^T=(\det A)I\):

Note

\(\bb a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33} \eb\bb C_{11}&C_{12}&C_{13}\\C_{21}&C_{22}&C_{23}\\C_{31}&C_{32}&C_{33} \eb= \bb \det A&0&0\\0&\det A&0\\0&0&\det A \eb\).

(Row 1 of \(A\)) times (column 1 of \(C^T\)) yields the first \(\det A\) on the right:

\[a_{11}C_{11}+a_{12}C_{12}+a_{13}C_{13}=\det A\]

This is exactly the cofactor rule.

Similarly row 2 of \(A\) times column 2 of \(C^T\) (notice the transpose) also yields \(\det A\). The entries \(a_{2j}\) are multiplying cofactors \(C_{2j}\) as they should, to give the determinant.

Row 2 of \(A\), Row 1 of \(C\):

\[a_{21}C_{11}+a_{22}C_{12}+a_{23}C_{13}=0.\]

Tjis is the cofactor rule for a new matrix, when the second row of \(A\) is copied into its first row. The new matrix \(A^*\) has two equal rows, so \(\det A^*=0\). Notice that \(A^*\) has the same cofactors \(C_{11},C_{12},C_{13}\) as \(A\)–because all rows agree after the first row. Thus the remarkable multiplication is correct:

\[AC^T=(\det A)I\quad\rm{or}\quad A\im=\frac{C^T}{\det A}.\]

The inverse of a triangular matrix is triangular. Cofactors give a reason why.

Area of a Triangle

If we know the corners \((x_1,y_1)\) and \((x_2,y_2)\) and \((x_3,y_3)\) of a triangle, what is the area? Using the corners to find the base and height is not a good way to compute area.

Determinants are the best way to find area. The area of a triangle is half of a 3 by 3 determinant. The square roots in the base and height cancel out in the good formula. If one corner is at the origin, the determinant is only 2 by 2.

Note

The triangle with corners \((x_1,y_1)\) and \((x_2,y_2)\) and \((x_3,y_3)\) has \(\dp\rm{area}=\frac{\rm{determinant}}{2}\):

  • Area of triangle: \(\dp\frac{1}{2}\bv x_1&y_1&1\\x_2&y_2&1\\x_3&y_3&1 \ev\quad\) \(\dp\rm{Area}=\frac{1}{2}\bv x_1&y_1\\x_2&y_2 \ev\rm{\ when\ }(x_3,y_3)=(0,0)\).

\[\begin{split}\rm{Area} = \frac{1}{2}\bv x_1&y_1&1\\x_2&y_2&1\\x_3&y_3&1 \ev = \frac{1}{2} (x_1y_2-x_2y_1)+\frac{1}{2}(x_2y_3-x_3y_2)+\frac{1}{2}(x_3y_1-x_1y_3).\end{split}\]

If \((0,0)\) is outside the triangle, two of the special areas can be negative–but the sum is still correct. The real problem is to explain the area of a triangle with corner \((0,0)\).

Why is \(\frac{1}{2}|x_1y_2-x_2y_1|\) the area of this triangle? We can remove the factor \(\frac{1}{2}\) for a parallelogram (twice as big). We now prove that the parallellogram area is the determinant \(x_1y_2-x_2y_1\).

Proof that a parallelogram staring from \((0,0)\) has area = 2 by 2 determinant:

We shoa that the area has the same properties 1-2-3 as the determinant. Those three rules defined the determinant and led to all its other properties.

  1. When \(A=I\), the parallelogram becomes the unit square. Its area is \(\det I=1\).

  2. When rows are exchanged, the determinant reverses sign. The absolute value (positive area) stays the same–it is the same parallelogram.

  3. If row 1 is multiplied by \(t\), the area is also multiplied by \(t\). Suppose a new row \((x_1^{\pr},y_1^{\pr})\) is added to \((x_1,y_1)\), the area is also added.

The \(n\) edges going out from the origin are given by the rows of an \(n\) by \(n\) matrix. The box is completed by more edges, like the parallelogram. For a three-dimensional box, the volume equals the absolute value of \(\det A\).

The unit cube has volume = 1, which is \(\det I\). Row exchanges or edge exchanges leave the same box and the same absolute volume. The determinant changes sign, to indicate whether the edges are a right-handed triple (\(\det A>0\)) or a left-handed triple (\(\det A<0\)). The box volume follows the rules for determinants, so volume of \(\det A =\) absolute value.

In calculus, the box is infinitesimally small! To integrate over a circle, we might change \(x\) and \(y\) to \(r\) and \(\theta\). Those are polar coordinates: \(x=r\cos\theta\) and \(y=r\sin\theta\). The area of a “polar box” is a determinant \(J\) times \(dr\ d\theta\):

Area \(r\ dr\ d\theta\) in calculus:

\[\begin{split}J=\bv \partial x/\partial r&\partial x/\partial \theta\\ \partial y/\partial r&\partial y/\partial \theta \ev= \bv \cos\theta&-r\sin\theta\\\sin\theta&r\cos\theta \ev = r.\end{split}\]

The determinant is the \(r\) in the small area \(dA=r\ dr\ d\theta\). The stretching factor \(J\) goes into double integrals just as \(dx/du\) goes into an ordinary integral \(\int dx=\int (dx/du)du\). For triple integrals the Jacobian matrix \(J\) with nine derivatives will be 3 by 3.

The Cross Product

Start with vectors \(\u=(u_1,u_2,u_3)\) and \(\v=(v_1,v_2,v_3)\). Unlike the dot product, which is a number, the cross product is a vector–also in three dimensions. It is written \(\u\times\v\). The components of this cross product are 2 by 2 cofactors.

Note

DEFINITION: The cross product of \(\u=(u_1,u_2,u_3)\) and \(\v=(v_1,v_2,v_3)\) is a vector:

  • \(\u\times\v=\bv \i&\j&\bs{k}\\u_1&u_2&u_3\\v_1&v_2&v_3 \ev= (u_2v_3-u_3v_2)\i+(u_3v_1-u_1v_3)\j+(u_1v_2-u_2v_1)\bs{k}\).

This vector \(\u\times\v\) is perpendicular to \(\u\) and \(\v\). The cross product \(\v\times\u\) is \(-(\u\times\v)\).

Comment: The 3 by 3 determinant is the easiest way to remember \(\u\times\v\). It is not especially legal, because the first row contains vectors \(\i,\j,\bs{k}\) and the other rows contain numbers. In the determinant, the vector \(\i=(1,0,0)\) multiplies \(u_2v_3\)-u_3v_2`. The result is \((u_2v_3-u_3v_2,0,0)\), which displays the first component of the cross product.

Property 1 \(\v\times\u\) reverses rows 2 and 3 in the determinant so it equals \(-(\u\times\v)\).

Property 2: The cross product \(\u\times\v\) is perpendicular to \(\u\) (and also to \(\v\)):

\[\u\cd(\u\times\v)=u_1(u_2v_3-u_3v_2)+u_2(u_3v_1-u_1v_3)+u_3(u_1v_2-u_2v_1)=0.\]

The determinnat for \(\u\cd(\u\times\v)\) has rows \(\u,\u,\v\) so it is zero.

Property 3: The cross product of any vector with itself (two equal rows) is \(\u\times\u=\0\).

When \(\u\) and \(\v\) are parallel, the cross product is zero. When \(\u\) and \(\v\) are perpendicular, the dot product is zero. One involves \(\sin\theta\) and the other involves \(\cos\theta\):

Note

\(\lv\u\times\v\rv=\lv\u\rv\lv\v\rv|\sin\theta|\quad and \quad|\u\cd\v\rv=\lv\u\rv\lv\v\rv|\cos\theta|\).

The length of \(\u\times\v\) equals the area of the parallelogram with sides \(\u\) and \(\v\).

Right hand rule: \(\u\times\v\) points along your right thumb when the fingers curl from \(\u\) to \(\v\). The right hand rule gives \(\i\times\j=\bs{k}, \j\times\bs{k}=\i, \bs{k}\times\i=\j\). In the opposite order (anti-cyclic) the thumb is reversed and the cross product goes the other way: \(\bs{k}\times\j=-\i,\i\times\bs{k}=-\j,\j\times\i=-\bs{k}\).

Note

DEFINITION: The cross product is a vector with length \(\lv\u\rv\lv\v\rv|\sin\theta|\). Its direction is perpendicular to \(\u\) and \(\v\). It points “up” or “down” by the right hand rule.

Triple Product = Determinant = Volume

Since \(\u\times\v\) is a vector, we can take its dot product with a third vector \(\w\). That produces the triple product \((\u\times\v)\cd\w\). It is called a “scalar” triple product, because it is a number. In fact it is a determinant–it gives the volume of the \(\u,\v,\w\) box:

Triple product:

\[\begin{split}(\u\times\v)\cd\w=\bv w_1&w_2&w_3\\u_1&u_2&u_3\\v_1&v_2&v_3 \ev=\bv u_1&u_2&u_3\\v_1&v_2&v_3\\w_1&w_2&w_3 \ev.\end{split}\]

We can put \(\w\) in the top or bottom row. The two determinants are the same because 2 row exchanges go from one to the other. Notice when this determinant is zero: \((\u\times\v)cd\w=0\) exactly when the vectors \(\u,\v,\w\) lie in the same plane.

First reason: \(\u\times\v\) is perpendicular to that plane so its dot product with \(\w\) is zero.

Second reason: Three vectors in a plane are dependent. The matrix is singular (\(\det =0\)).

Thrid reason: Zero volume when the \(\u,\v,\w\) box is squashed onto a plane.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 6 Eigenvalues and Eigenvectors

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 6.1 Introduction to Eigenvalues

The key idea is to avoid all the complications presented by the matrix \(A\). Suppose the solution vector \(\u(t)\) stays in the direction of a fixed vctor \(\x\). Then we only need to find the number (changing with time) that muiltiplies \(\x\). A number is easier than a vector. We want “eigenvectors” \(\x\) that don’t change direction when you multiply by \(A\).

Almost all vectors change direction, when they are multiplied by \(A\). Certain exceptional vectors \(\x\) are in the same direction as \(A\x\). Those are the “eigenvectors”. multiply an eigenvector by \(A\), and the vector \(A\x\) is a number \(\ld\) times the original \(\x\).

Tip

The basic equation is \(A\x=\ld\x\). The number \(\ld\) is an eigenvalue of \(A\).

The eigenvalue \(\ld\) tells whether the special vector \(\x\) is stretched or shrunk or reversed or left unchanged–when it is multiplied by \(A\). The eigenvalue \(\ld\) could be zero. Then \(A\x=0\x\) means that this eigenvector \(\x\) is in the nullspace.

If \(A\) is the identity matrix, every vector has \(A\x=\x\). All vectors are eigenvectors of \(I\). All eigenvalues “lambda” are \(\ld=1\). Most 2 by 2 matrices have two eigenvector directions and two eigenvalues. We will show that \(\det(A-\ld I)=0\).

The matrix \(A=\bb .8&.3\\.2&.7 \eb\) has two eigenvalues \(\ld=2\) and \(\ld=1/2\). The eigenvectors \(\x_1=(.6,.4)\) and \(\x_2=(1.-1)\) are in the nullspaces of \(A-I\) and \(A-\frac{1}{2}I\).

If \(\x_1\) is multiplied again by \(A\), we still get \(\x_1\). Every power of \(A\) will give \(A^n\x_1=\x_1\). Multiplying \(\x_2\) by \(A\) gave \(\frac{1}{2}\x_2\), and if we multiply again we get \((\frac{1}{2})^2\) times \(\x_2\).

Tip

When \(A\) is squared, the eigenvectors stay the same. The eigenvalues are squared.

Separate into eigenvectors then multiply by \(A\):

\[\begin{split}\bb .8\\.2 \eb=\x_1+(.2)\x_2=\bb .6\\.4 \eb+\bb .2\\-.2 \eb.\end{split}\]

When we multiply separately for \(\x_1\) and \((.2)\x_2\), \(A\) multiplies \(\x_2\) by its eigenvalue \(\frac{1}{2}\)

Multiply each \(\x_i\) by \(\ld_i\):

\[\begin{split}A\bb .8\\.2 \eb\rm{ \ is\ }\x_1+\frac{1}{2}(.2)\x_2=\bb .6\\.4 \eb+\bb .1\\-.1 \eb=\bb .7\\.3 \eb.\end{split}\]

Each eigenvector is multiplied by its eigenvalue, when we multiply by \(A\).

\[\begin{split}A^{99}\bb .8\\.2 \eb\rm{\ is\ really\ }\x_1+(.2)(\frac{1}{2})^{99}\x_2= \bb .6\\.4 \eb+\bb \rm{very}\\\rm{small}\\\rm{vector} \eb.\end{split}\]

The eigenvector \(\x_1\) is a “steady state” that doesn’t change (because \(\ld_1=1\)). The eigenvector \(\x_2\) is a “decaying mode” that virtually disappears (because \(\ld_2=.5\)). The higher the power of \(A\), the more closely its columns approach the steady state.

The particular \(A\) is a Markov matrix. Its largest eigenvalue is \(\ld=1\). Its eigenvector \(\x_1=(.6,.4)\) is the steady state–which all columns of \(A^k\) will approach.

For projection matrices \(P\), we can see when \(P\x\) is parallel to \(\x\). The eigenvectors for \(\ld=1\) and \(\ld=0\) fill the column space and nullspace. The column space doesn’t move (\(P\x=\x\)). The nullspace goes to zero (\(P\x=0\x\)).

Note

The projection matrix \(P=\bb .5&.5\\.5&.5 \eb\) has eigenvalues \(\ld=1\) and \(\ld=0\).

Its eigenvectors are \(\x_1=(1,1)\) and \(\x_2=(1,-1)\). For those vectors, \(P\x_1=\x_1\) (steady state) and \(P\x_2=\0\) (nullspace). This example illustrates mrkov matrices and singular matrices and (most important) symmetric matrices. All have special \(\ld\)’s and \(\x\)’s:

  1. Markov matrix: Each column of \(P\) adds to 1, so \(\ld=1\) is an eigenvalue.

  2. \(P\) is singular, so \(\ld=0\) is an eigenvalue.

  3. \(P\) is symmetric, so its eigenvectors \((1,1)\) and \((1,-1)\) are perpendicular.

The only eigenvalues of a projection matrix are 0 and 1. The eigenvectors for \(\ld=0\) (which means \(P\x=0\x\)) fill up the nullspace. The eigenvectors for \(\ld=1\) (which means \(P\x=\x\)) fill up the column space. The nullspace is projected to zero. The column space projects onto itself. The projection keeps the column space and destroys the nullspace:

Project each part \(\v=\bb 1\\-1 \eb+\bb 2\\2 \eb\) projects onto \(P\v=\bb 0\\0 \eb+\bb 2\\2 \eb\).

Projections have \(\ld=0\) and \(1\). Permutations have all \(|\ld|=1\). The next matrix \(R\) is a reflection an at the same time a permutation. \(R\) also has special eigenvalues.

Note

The reflection matrix \(R=\bb 0&1\\1&0 \eb\) has eigenvalues 1 and -1.

The eigenvector \((1,1)\) is unchanged by \(R\). The second eigenvector is \((1,-1)\)–its signs are reversede by \(R\). A matrix with no negative entries can still have a negative eigenvalue. The eigenvectors for \(R\) are the same as for \(P\), because \(reflection=2(projection)-I\):

\(R=2P-I\):

\[\begin{split}\bb 0&1\\1&0 \eb=2\bb .5&.5\\.5&.5 \eb-\bb 1&0\\0&1 \eb.\end{split}\]

When a matrix is shifted by \(I\), each \(\ld\) is shifted by 1. No change in eigenvectors.

The Equation for the Eigenvalues

For projection matrices we found \(\ld\)’s and \(\x\)’s by geometry: \(P\x=\x\) and \(P\x=\0\). For other matrices we use determinants and linear algebra.

First move \(\ld\x\) to the left side. Write the equation \(A\x=\ld\x\) as \((A-\ld I)\x=\0\). The matrix \(A-\ld I\) times the eigenvector \(\x\) is the zero vector. The eigenvectors make up the nullspace of \(A-\ld I\). When we know an eigenvalue \(\ld\), we find an eigenvector by solving \((A-\ld I)\x-\0\).

If \((A-\ld I)\x=\0\) has a nonzero solution, \(A-\ld I\) is not invertible. The determinant of \(A\ld I\) must be zero. This is how to recognize an eigenvalue \(\ld\):

Note

Eigenvalues: The number \(\ld\) is an eigenvalue of \(A\) if and only if \(A-\ld I\) is singular.

  • Equation for the eigenvalues: \(\det (A-\ld I)=0\).

This “characteristic polynomial\(\det(A-\ld I)\) involves only \(\ld\), not \(\x\). When \(A\) is \(n\) by \(n\), \(\det(A-\ld I)\) has degree \(n\). Then \(A\) has \(n\) eigenvalues (repeats possible). Each \(\ld\) leads to \(\x\):

Note

For each eigenvalue \(\ld\) solve \((A-\ld I)\x=\0\) or \(A\x=\ld\x\) to find an eigenvector \(\x\).

Summary: To solve the eigenvalue problem for an \(n\) by \(n\) matrix, follow these steps:

Note

1. Compute the determinant of \(A-\ld I\). With \(\ld\) subtracted along the diagonal, this determinant starts with \(\ld^n\) or \(-\ld^n\). It is a polynomial in \(\ld\) of degree \(n\).

2. Find the roots of this polynomial, by solving \(\det(A-\ld I)=0\). The \(n\) roots are the \(n\) eigenvalues of \(A\) singular.

3. For each eigenvalue \(\ld\), solve \((A-\ld I)\x=\0\) to find an eigenvector \(\x\).

A note on the eigenvectors of 2 by 2 matrices. When \(A-\ld I\) is singular, both rows are multiples of a vector \((a,b)\). The eigenvector is any multiple of \((b,-a)\). There is a whole line of eigenvectors–any nonzero multiple of \(\x\) is as good as \(\x\).

Some 2 by 2 matrices have only one line of eigenvectors. This can only happen when two eigenvalues are equal. (On the other hand \(A=I\) has equal eigenvalues and plenty of eigenvectors.) Without a full set of eigenvectors, we don’t have a basis. We can’t write every \(\v\) as a combination of eigenvectors.

Determinant and Trace

Elimination does not preserve the \(\ld\)’s. The triangular \(U\) has its eigenvalues sitting along the diagonal–they are the pivots. But they are not the eigenvalues of \(A\)!

The product \(\ld_1\) times \(\ld_2\) and the sum \(\ld_1+\ld_2\) can be found quickly from the matrix.

Note

The product of the \(n\) eigenvalues equals the determinant. The sum of the \(n\) eigenvalues equals the sum of the \(n\) diagonal entries.

The sum of the entries along the main diagonal is called the trace of \(A\):

Note

\(\ld_1+\ld_2+\cds+\ld_n=\bs{trace}=a_{11}+a_{22}+\cds+a_{nn}\).

Tip

Why do the eigenvalues of a triangular matrix lie along its diagonal?

Imaginary Eigenvalues

The eigenvalues might not be real numbers.

Note

The \(90^{\circ}\) rotation \(Q=\bb 0&-1\\1&0 \eb\) has no real eigenvectors. Its eigenvalues are \(\ld_1=i\) and \(\ld_2=-i\). Then \(\ld_1+\ld_2=trace=0\) and \(\ld_1\ld_2=determinant=1\).

After a rotation, no real vector \(Q\x\) stays in the same direection as \(\x\) (\(\x=\0\) is useless). There cannot be an eigenvector, unless we go to imaginary numbers.

To see how \(i=\sqrt{-1}\) can help, look at \(Q^2\) which is \(-I\). If \(Q\) is rotation through \(90^{\circ}\), then \(Q^2\) is rotation through \(180^{\circ}\). Its eigenvalues are -1 and 1. (Certainly \(-I\x=1\x\).) Squaring \(Q\) will square each \(\ld\), so we must have \(\ld^2=-1\). The eigenvalues of the \(90^\circ\) rotation matrix \(Q\) are \(+i\) and \(-1\), because \(i^2=-1\).

Those \(\ld\)’s come as usual from \(\det(Q-\ld I)=0\). This equation gives \(\ld^2+1=0\). Its root are \(i\) and \(-i\). We meet the imaginary number \(i\) also in the eigenvectors:

Complex eigenvectors:

\[\begin{split}\bb 0&-1\\1&0 \eb\bb 1\\i \eb=-i\bb 1\\i \eb\quad\rm{and}\quad\bb 0&-1\\1&0 \eb\bb i\\1 \eb=i\bb i\\1 \eb.\end{split}\]

Somehow these complelx vectors \(\x_1=(1,i)\) and \(\x_2=(i,1)\) keep their direction as they are rotated. The particualr eigenvalues \(i\) and \(-i\) also illustrate two special properties of \(Q\):

  1. \(Q\) is an orthogonal matrix so the absolute value of each \(\ld\) is \(|\ld|=1\).

  2. \(Q\) is a skew-symmetric matrix so each \(\ld\) is pure imaginary.

A symmetric matrix (\(S^T=S\)) can be compared to a real number. A skew-symmetric matrix (\(A^T=-A\)) can be compared to an imaginary number. An orthogonal matrix matrix (\(Q^TQ=I\)) corresponds to a complex number with \(|\ld=1|\). For the eigenvalues of \(S\) and \(A\) and \(Q\), those are more than analogies.

The eigenvectors for all these special matrices are perpendicular.

Eigenvalues of \(AB\) and \(A+B\)

Note

\(A\) and \(B\) share the same \(n\) independent eigenvectors if and only if \(AB=BA\).

Heisenberg’s uncertainty principle: In quantum mechancis, the position matrix \(P\) and the momemtum matrix \(Q\) do not commute. In fact \(QP-PQ=I\) (these are infinite matrices).

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 6.2 Diagonalizing a Matrix

When \(\x\) is an eigenvector, multiplication by \(A\) is just multiplication by a number \(\ld\): \(A\x=\ld\x\).

The point of this section is very direct. The matrix \(A\) turns into a diagonal matrix \(\Ld\) when we use the eigenvectors properly.

Note

Diagonalization: Suppose the \(n\) by \(n\) matrix \(A\) has \(n\) linearly independent eigenvectors \(\x1,\cds,\x_n\). Put them into the columns of an eigenvector matrix \(X\). Then \(X\im AX\) is the eigenvalue matrix \(\Ld\):

  • Eigenvector matrix \(X\), Eigenvalue matrix \(\Ld\): \(X\im AX=\Ld=\bb \ld_1\\&\dds\\&&\ld_n \eb\).

The matrix \(A=\bb 1&5\\0& 6\eb\) is triangular so its eigenvalues are on the diagonal: \(\ld_1=1,\ld_2=6\).

Note

Eigenvectors go into \(X\):

  • \(\bb 1\\0 \eb\bb 1\\1 \eb\).

\[\begin{split}\bb 1&-1\\0&1 \eb\bb 1&5\\0&6 \eb\bb 1&1\\0&1 \eb=\bb 1&0\\0&6 \eb.\end{split}\]

In other words \(A=X\Ld X\im\). Then \(A^2=X\Ld X\im X\Ld X\im=X\Ld^2X\im\).

\(A^2\) has the same eigenvectors in \(X\) and squared eigenvalues in \(\Ld^2\).

Why is \(AX=X\Ld\)? \(A\) multiplies its eigenvectors, which are the columns of \(X\). The first column of \(AX\) is \(A\x_1\). That is \(\ld_1\x_1\). Each column of \(X\) is multiplied by its eigenvalue:

\(A\) times \(X\):

\[\begin{split}AX=A\bb \\\ \x_1&\cds&\x_n \\\ \eb=\bb \\\ \ld_1\x_1&\cds&\ld_n\x_n \\\ \eb.\end{split}\]

The trick is to split thsis matrix \(AX\) into \(X\) times \(\Ld\):

\(X\) times \(\Ld\):

\[\begin{split}\bb \\\ \ld_1\x_1&\cds&\ld_n\x_n \\\ \eb=\bb \\\ \x_1&\cds&\x_n \\\ \eb\bb \ld_1\\&\dds\\&&\ld_n \eb=X\Ld.\end{split}\]

Keep those matrices in the right order. Then \(\ld_1\) multiplies the first column \(\x_1\), as shown. The diagonalization is complete, and we can write \(AX=X\Ld\) in two good ways:

Note

\(AX=X\Ld\) is \(X\im AX=\Ld\) or \(A=X\Ld X\im\).

The matrix \(X\) has an inverse, because its columns (the eigenvectors of \(A\)) were assumede to be linearly independent. Without \(n\) independent eigenvectors, we can’t diagonalize.

\(A\) and \(\Ld\) have the same eigenvalues \(\ld_1,\cds,\ld_n\). The eigenvectors are different. The \(k\) which is easy to compute:

\[A^k=(X\Ld X\im)(X\Ld X\im)\cds(X\Ld X\im)=X\Ld^kX\im.\]

Powers of \(A\):

\[\begin{split}\bb 1&5\\0&6 \eb^k=\bb 1&1\\0&1 \eb\bb 1\\&6^k \eb\bb 1&-1\\0&1 \eb=\bb 1&6^k-1\\0&6^k \eb=A^k.\end{split}\]

Remark 1: Suppose the eigenvalues \(\ld_1,\cds,\ld_n\) are all different. Then it is automatic that the eigenvectors \(\x_1,cds,\x+n\) are independent. The eigenvector matrix \(X\) will be invertible. Any matrix that has no repeated eigenvalues can be diagonalized.

Remark 2: We can multiply eigenvectors by any nonzero constants. \(A(c\x)=\ld(c\x)\) is still true.

Remark 3: The eigenvectors in \(X\) come in the same order as the eigenvalues in \(Ld\). To reverse the order in \(\Ld\), put the eigenvector \((1,1)\) before \((1,0)\) in \(X\):

New order :math;`6,1`:

\[\begin{split}\bb 0&1\\1&-1 \eb\bb 1&5\\0&6 \eb\bb 1&1\\1&0 \eb=\bb 6&0\\0&1 \eb=\Ld_{\rm{new}}.\end{split}\]

To diagonalize \(A\) we must use an eigenvector matrix. From \(X\im AX=\Ld\) we know that \(AX=X\Ld\). Suppose the first column of \(X\) is \(\x\). Then the first columns of \(AX\) and \(X\Ld\) are \(A\x\) and \(\ld_1\x\). For those to be equal, \(\x\) must be an eigenvector.

Remark 4 (repeated warning for repeated eigenvalues): Some matrices have too few eigenvectors. Those matrices cannot be diagonalized.

Not diagnoalizable:

\[\begin{split}A=\bb 1&-1\\1&-1 \eb\quad \rm{and} \quad B=\bb 0&1\\0&1 \eb.\end{split}\]

Their engenvalues happen to be 0 and 0. Nothing is special about \(\ld=0\), the problem is the repetition of \(\ld\). All eigenvectors of the first matrix are multiples of \((1,1)\):

Only one line of eigenvectors:

\[\begin{split}A\x=0\x \quad\rm{means}\quad \bb 1&-1\\1&-1 \eb\bb \\\ \x\\\ \eb=\bb 0\\0 \eb\quad\rm{and}\quad \x=c\bb 1\\1 \eb.\end{split}\]

There is no second eigenvector, so this unusual matrix \(A\) cannot be diagonalized.

  • Invertibility is concerned with the eigenvalues (\(\ld=0\) or \(\ld\neq 0\)).

  • Diagonalizability is concerned with the eigenvectors (too few or enough for \(X\)).

Each eigenvalue has at least one eigenvector. \(A-\ld I\) is singular. If \((A-\ld I)\x=\0\) leads you to \(\x=\0\), \(\ld\) is not an eigenvalue. Look for a mistake in solving \(\det(A-\ld I)=0\).

Tip

Eigenvectors for \(n\) different \(\ld\)‘s are independent. Then we can diagonalize \(A\).

Note

Independent \(\x\) from different \(\ld\): Eigenvectors \(\x_1,\cds,\x_j\) that correspond to distinct (all different) eigenvalues are linearly independent. An \(n\) by \(n\) matrix that has \(n\) different eigenvalues (no repeated \(\ld\)’s) must be diagonalizable.

Proof: Suppose \(c_1\x_1+c_2\x_2=\0\). Multiply by \(A\) to find \(c_1\ld_1\x_1+c_2\ld_2\x_2=\0\). Multiply by \(\ld_2\) to find \(c_1\ld_2\x_1+c_2\ld_2\x_2=\0\). Now subtract one from the other: Subtraction leaves \((\ld_1-\ld_2)c_1\x_1=\0\). Therefore \(c_1=0\).

Since the \(\ld\)’s are different and \(\x_1\neq\0\), we are forced to the conclusion that \(c_1=0\). Similarly \(c_2=0\). Only the combination with \(c_1=c_2=0\) gives \(c_1\x_1+c_2\x_2=\0\). So the eigenvectors \(\x_1\) and \(\x_2\) must be independent.

This proof extends directly to \(j\) eigenvectors. Suppose that \(c_1\x_1+\cds+c_j\x_j=\0\), Multiply by \(A\), multiply by \(\ld_j\) and subtract. This multiplies \(\x_j\) by \(\ld_j-\ld_j=0\), and \(\x_j\) is gone. Now multiply by \(A\) and by \(\ld_{j-1}\) and subtract. This removes \(\x_{j-1}\). Eventually only \(\x_1\) is left: We reach \((\ld_1-\ld_2)\cds(\ld_1-\ld_j)c_1\x_1=\0\) which forces \(c_1=0\).

Similarly every \(c_i=0\). When the \(\ld\)’s are all different, the eigenvectors are independent. A full set of eigenvectors can go into the columns of the eigenvector matrix \(X\).

Note

Question: When does \(A^k\rightarrow zero\ matrix\)? Answer: All \(|\ld|<1\).

Similar Matrices: Same Eigenvalues

Suppose the eigenvalue matrix \(\Ld\) is fixed. As we change the eigenvector matrix \(X\), we get a whole family of different matrices \(A=X\Ld X\im\)all with the same eigenvalues in \(\Ld\). All those matrices \(A\) (with the same \(\Ld\)) are called similar*.

This idea extends to matrices that can’t be diagonalized. Again we choose one constant matrix \(C\) (not necessariily \(\Ld\)). And we look at the whole family of matrices \(A=BCB\im\), allowing all invertible matrices \(B\). Again those matrices \(A\) and \(C\) are called similar.

We are using \(C\) instead of \(\Ld\) because \(C\) might not be diagonal. We are using \(B\) instead of \(X\) because the columns of \(B\) might not be eigenvectors. We only require that \(B\) is invertible–its columns can cantain any basis for \(\R^n\). Similar matrices \(A\) and \(C\) have the same eigenvalues.

Note

All the matrices \(A=BCB\im\) are “similar”. They all share the eigenvalues of \(C\).

Proof: Suppose \(C\x=\ld\x\). Then \(BCB\im\)ld` with the new eigenvector \(B\x\):

\[Same\ \ld\quad(BCB\im)(B\x)=BC\x=B\ld\x=\ld(B\x).\]

A fixed matrix \(C\) produces a family of similar matrices \(BCB\im\), allowing all \(B\). When \(C\) is the identity matrix, the “family” is very small. The only member is \(BIB\im=I\). The identity matrix is the only diagonalizable matrix with all eigenvalues \(\ld=1\).

The family is larger when \(\ld=1\) and \(1\) with only one eigenvector (not diagonalizable). The simpliest \(C\) is the Jordan form. All the similar \(A\)’s have two parameters \(r\) and \(s\), not both zero: always determinant = 1 and trace = 2.

\[\begin{split}C=\bb 1&1\\0&1 \eb=\rm{\ Jordan\ form\ gives\ }A=BCB\im=\bb 1-rs&r^2\\-s^2&1+rs \eb.\end{split}\]

Fibonacci numbers

Every new Fibonacci number is the sum of the two previous \(F\)‘s:

Note

The sequence \(0,1,1,2,3,5,8,13,\cds\) comes from \(F_{k+2}=F_{k+1}+F_k\).

Problem: Find the Fibonacci number \(F_{100}\): The slow way is to apply the rule \(F_{k+2}=F_{k+1}+F_k\). Linear algebra gives a better way.

The key is to begin with a matrix equation \(\u_{k+1}=A\u_k\). That is a one-step rule for vectors, while Fibonacci gave a two-step rule for scalars. We match those rules by putting two Fibonacci numbers into a vector. Then you will see the matrix \(A\).

Note

Let \(\u_k=\bb F_{k+1}\\F_k \eb\). The rule \(\begin{matrix}F_{k+2}=F_{k+1}+F_k\\F_{k+1}=F_{k+1}\quad\quad \end{matrix}\) is \(\u_{k+1}=\bb 1&1\\1&0 \eb\u_k\).

Every step multiplies by \(A=\bb 1&1\\1&0 \eb\). After 100 steps we reach \(\u_{100}=A^{100}\u_0\):

\[\begin{split}\u_0=\bb 1\\0 \eb, \u_1=\bb 1\\1 \eb, \u_2=\bb 2\\1 \eb, \u_3=\bb 3\\2 \eb, \cds, \u_{100}=\bb F_{101}\\F_{100} \eb.\end{split}\]

Subtract \(\ld\) from the diagonal of \(A\):

\[\begin{split}A=\ld I=\bb 1-\ld&1\\1&-\ld \eb\quad\rm{leads\ to}\quad\det(A-\ld I)=\ld^2-\ld-1.\end{split}\]

Note

Eigenvalues: \(\dp\ld_1=\frac{1+\sqrt{5}}{2}\approx 1.618\) and \(\dp\ld_2=\frac{1-\sqrt{5}}{2}\approx -.618\).

These eigenvalues lead to eigenvectors \(\x_1=(\ld_1,1)\) and \(\x_2=(\ld_2,1)\). Step 2 finds the combination of those eigenvectors that gives \(\u_0=(1,0)\):

\[\begin{split}\bb 1\\0 \eb=\frac{1}{\ld_1-\ld_2}\left(\bb\ld_1\\1\eb-\bb\ld_2\\1\eb\right) \quad\rm{or}\quad\u_0=\frac{\x_1-\x_2}{\ld_1-\ld_2}.\end{split}\]

Step 3 multiplies \(u_0\) by \(A^{100}\) to find \(u_{100}\). The eigenvectors \(\x_1\) and \(\x_2\) stay separate! They are multiplied by \((\ld_1)^{100}\) and \((\ld_2)^{100}\):

100 steps from \(\u_0\):

\[\u_{100}=\frac{(\ld_1)^{100}\x_1-(\ld_2)^{100}\x_2}{\ld_1-\ld_2}.\]

We want \(F_{100}=\) second component of \(\u_{100}\). The second components of \(\x_1\) and \(\x_2\) are 1. The difference between \(\ld_1=(1+\sqrt{5})/2\) and \(\ld_2=(1-\sqrt{5})/2\) is \(\sqrt{5}\). And \(\ld_2^{100}\approx 0\).

100th Fibonacci number \(=\dp\frac{\ld_1^{100}-\ld_2^{100}}{\ld_1-\ld_2}=\) nearest integer to \(\dp\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^{100}\).

Every \(F_k\) is a whole number. The ratio \(F_{101}/F_{100}\) must be very close to the limiting ratio \((1+\sqrt{5})/2\). This number is the “golden mean”.

Matrix Powers \(A^k\)

Fibonacci’s example is a typical difference equation \(\u_{k+1}=A\u_k\). Each step multiplies by \(A\). The solution is \(\u_k=A^k\u_0\).

The eigenvector matrix \(X\) produces \(A=X\Ld X\im\). This is a factorization of the matrix, like \(A=LU\) or \(A=QR\). The new factorization is perfectly suited to computing powers, because every time \(X\im\) multiplies \(X\) we get \(I\):

Powers of \(A\):

\[A^k\u_0=(X\Ld X\im)\cds(X\Ld X\im)\u_0=X\Ld^kX\im\u_0.\]

I will split \(X\Ld^kX\im\u_0\) into three steps that show how eigenvalues work:

  1. Write \(\u_0\) as a combination \(c_1\x_1+\cds+n_n\x_n\) of the eigenvectors. Then \(\bs{c}=X\im\u_0\).

  2. Multiply each eigenvector \(\x_i\) by \((\ld_i)^k\). Now we have \(\Ld^kX\im\u_0\).

  3. Add up the pieces \(c_i(\ld_i)^k\x_i\) to find the solution \(\u_kA^k\u_0\). This is \(X\Ld^kX\im\u_0\).

Note

Solution for \(\u_{k+1}=A\u_k\): \(\u_k=A^k\u_0=c_1(\ld_1)^k\x_1+\cds+c_n(\ld_n)^k\x_n\).

In matrix language \(A^k\) equals \((X\Ld X\im)^k\) which is \(X\) times \(\Ld^k\) times \(X\im\). In Step 1, the eigenvectors in \(X\) lead to the \(c\)’s in the combination \(\u_0=c_1\x_1+\cds+c_n\x_n\):

Step 1: \(\u_0=\bb \\\ \x_1&\cds&\x_n\\\ \eb\bb c_1\\\vds\\c_n \eb\). This says that \(\u_0=X\bs{c}\).

The coefficients in Step 1 are \(\bs{c}=X\im\u_0\). Then Step 2 multiplies by \(\Ld^k\). The final result \(\u_k=\sum c_i(\ld_i)^k\x_i\) in Step 3 is the product of \(X\) and \(\Ld^k\) and \(X\im\u_0\):

\[\begin{split}A^k\u_0=X\Ld^kX\im\u_0=X\Ld^k\bs{c}=\bb \\\ \x_1&\cds&\x_n \\\ \eb \bb (\ld_1)^k\\&\dds\\&&(\ld_n)^k \eb\bb c_1\\\vds\\c_n \eb.\end{split}\]

The result is exactly \(\u_k=c_1(\ld_1)^k\x_1+\cds+c_n(\ld_n)^k\x_n\). It solves \(\u_{k+1}=A\u_k\).

Nondiagonalizable Matrices (Optional)

Suppose \(\ld\) is an eigenvalue of \(A\). We discover that fact in two ways:

1. Eigenvectors (geometric): There are nonzero solutions to \(A\x=\ld\x\).

2. Eigenvalues (algebraic): The determinant of \(A-\ld I\) is zero.

The number \(\ld\) may be a simple eigenvalue or a multiple eigenvalue, and we want to know its multiplicity. Most eigenvalues have multiplicity \(M=1\) (simple eigenvalues). Then there is a single line of eigenvectors, and \(\det(A-\ld I)\) does not have a double factor.

For exceptional matrices, an eigenvalue can be repeated. Then there are two different ways to count its multiplicity. Always GM \(\leq\) AM for each \(\ld\):

  1. (Geometric Multiplicity = GM): Count the independent eigenvectors for \(\ld\). The GM is the dimension of the nullspace of \(A-\ld I\).

  2. (Algebraic Multiplicity = AM): AM counts the repetitions of \(\ld\) among the eigenvalues. Look at the \(n\) roots of \(\det(A-\ld I)=0\).

If \(A\) has \(\ld=4,4,4\), then that eigenvalue has AM = 3 and GM = 1, 2, or 3.

The shortage of eigenvectors when GM is below AM means that \(A\) is not diagonalizable.


If \(A\) is \(m\) by \(n\) and \(B\) is \(n\) by \(m\), then \(AB\) and \(BA\) have same nonzero eigenvalues.

Proof: Start with this identity between square matrices (easily checked). The first and third matrices are inverses. The “size matrix” shows the shapes of all blocks.

\[\begin{split}\bb I&-A\\0&I \eb\bb AB&0\\B&0 \eb\bb I&A\\0&I \eb=\bb 0&0\\B&BA \eb\quad \bb m \times m&m \times n\\n \times m&n \times n \eb\end{split}\]

This equation \(D\im ED=F\) says \(F\) is similar to \(E\)–they have the same \(m+n\) eigenvalues.

\(E=\bb AB&0\\B&0 \eb\) has the \(m\) eigenvalues of \(AB\), plus \(n\) zeros

\(F=\bb 0&0\\B&BA \eb\) has the \(n\) eigenvalues of \(BA\), plus \(m\) zeros

So \(AB\) and \(BA\) have the same eigenvalues except for \(|n-m|\) zeros.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 6.3 Systems of Differential Equations

The derivative of \(e^{\ld t}\) is \(\ld e^{\ld t}\). The whole point of the section is this: To convert constant-coefficient differential equations into linear algebra.

The ordinary equations \(\dp\frac{du}{dt}=u\) and \(\dp\frac{du}{dt}=\ld u\) are solved by exponentials:

\[\frac{du}{dt}=u\rm{\ produces\ }u(t)=Ce^t\quad\frac{du}{dt}=\ld u\rm{\ produces\ }u(t)=Ce^{\ld t}.\]

At time \(t=0\) those solutions include \(e^0=1\). So they both reduce to \(u(0)=C\). This “initial value” tells us the right choice for \(C\). The solutions that start from the number \(u(0)\) at time \(t=0\) are \(u(t)=u(0)e^t\) and \(u(t)=u(0)e^{\ld t}\).

Linear algebra moves from 1 by 1 problems to \(n\) by \(n\). The unknown is a vector \(\u\). It starts from the initial vector \(\u(0)\), which is given. The \(n\) equations contain a square matrix \(A\). We expect \(n\) exponents \(e^{\ld t}\) in \(\u(t)\), from \(n\ \ld\)’s:

Note

System of \(n\) equations: \(\dp\frac{d\u}{dt}=A\u\) starting from the vector \(\u(0)=\bb u_1(0)\\\vds\\u_n(0) \eb\) at \(t=0\).

These differential equations are linear. If \(\u(t)\) and \(\v(t)\) are solutions, so is \(C\u(t)+D\v(t)\). We will need \(n\) constants like \(C\) and \(D\) to match the \(n\) components of \(\u(0)\). Our first job is to find \(n\) “pure exponential solutions” \(\u=e^{\ld t}\x\) by using \(A\x=\ld\x\).

Notice that \(A\) is a constant matrix. In other linear equations, \(A\) changes as \(t\) changes. In nonlinear equations, \(A\) changes as \(\u\) changes. \(d\u/dt=A\u\) is “linear with constant coefficients”.

Tip

Solve linear constant coefficient equations by exponentials \(e^{\ld t}\x\), when \(A\x=\ld\x\).

Solution of \(d\boldsymbol{u}/dt=A\boldsymbol{u}\)

Our pure exponential solution will be \(e^{\ld t}\) times a fixed vector \(\x\). \(\ld\) is an eigenvalue of \(A\) and \(\x\) is the eigenvector. Substitute \(\u(t)=e^{\ld t}\x\) into the equation \(d\u/dt=A\u\), and the factor \(e^{\ld t}\) will cancel to leave \(\ld\x=A\x\):

Note

Choose \(\u=e^{\ld t}\x\) when \(A\x=\ld\x\):

  • \(\dp\frac{d\u}{dt}=\ld e^{\ld t}\x\) agrees with \(A\u=Ae^{\ld t}\x\).

All components of this special solution \(\u=e^{\ld t}\x\) share the same \(e^{\ld t}\). The solution grows when \(\ld>0\). It decays when \(\ld<0\). If \(\ld\) is a complex number, its real part decides growth or decay. The imaginary part \(\omega\) gives oscillation \(e^{i\omega t}\) like a sine wave.

For \(\dp\frac{d\u}{dt}=A\u=\bb 0&1\\1&0 \eb\u\) starting from \(\u(0)=\bb 4\\2 \eb\), this is a vector equation for \(\u\). It contains two scalar equations for the components \(y\) and \(z\):

\[\begin{split}\frac{d\u}{dt}=A\u\quad\frac{d}{dt}\bb y\\z \eb=\bb 0&1\\1&0 \eb\bb y\\z \eb \rm{\ means\ that\ }\frac{dy}{dt}=z\rm{\ and\ }\frac{dz}{dt}=y.\end{split}\]

The idea of eigenvectors is to combine those equations in a way that gets back to 1 by 1 problems. The combinations \(y+z\) and \(y-z\) will do it. Add and subtract equations:

\[\frac{d}{dt}(y+z)=z+y\quad\rm{and}\quad\frac{d}{dt}(y-z)=-(y-z).\]

The combination \(y+z\) grows like \(e^t\), because it has \(\ld=1\). The combination \(y-z\) decays like \(e^{-1}\), because it has \(\ld=-1\).

This matrix \(A\) has eigenvalues 1 and -1. The eigenvectors \(\x\) are \((1,1)\) and \((1,-1)\). The pure exponential solutions \(\u_1\) and \(\u_2\) take the form \(e^{\ld t}\x\) with \(\ld_1=1\) and \(\ld_2=-1\):

Note

\(\u_1(t)=e^{\ld_1t}\x_1=e^t\bb 1\\1 \eb\) and \(\u_2(t)=e^{\ld_2t}\x_2=e^{-t}\bb 1\\-1 \eb\).

Notice: These \(\u\)’s satisfy \(A\u_1=\u_1\) and \(A\u_2-\u_2\), just like \(\x_1\) and \(\x_2\). The factors \(e^t\) and \(e^{-1}\) change with time. Those factors give \(d\u_1/dt=\u_1=A\u_1\) and \(d\u_2/dt=-\u_2=A\u_2\). We have two solutions to \(d\u/dt=A\u\). To find all other solutions, multiply those special solutions by any numbers \(C\) and \(D\) and add:

Complete solution:

\[\begin{split}\u(t)=Ce^t\bb 1\\1 \eb+De^{-t}\bb 1\\-1 \eb=\bb Ce^t+De^{-t}\\Ce^t-De^{-t} \eb.\end{split}\]

With these two constants \(C\) and \(D\), we can match any starting vector \(\u(0)=(u_1(0),u_2(0))\). Set \(t=0\) and \(e^0=1\). For \(\u(0)=(4,2)\):

\(\u(0)\) decides \(C,D\):

\[\begin{split}C\bb 1\\1 \eb+D\bb 1\\-1 \eb= \bb 4\\2 \eb\quad\rm{yields}\quad C=3\rm{\ and\ }D=1.\end{split}\]

The same three steps that solved \(\u_{k+1}=A\u_k\) now solve \(d\u/dt=A\u\):

  1. Wirte \(\u(00\) as a combination \(c_1\x_1+\cds+c_n\x_n\) of the eigenvectors of \(A\).

  2. Multiply each eigenvector \(\x_i\) by its growth factor \(e^{\ld_it}\).

  3. The solution is the same combination of those pure solutions \(e^{\ld t}\x\):

    Note

    \(\u(t)=c_1e^{\ld_1t}\x_1+\cds+c_ne^{\ld_nt}\x_n\).

Not included: If two \(\ld\)’s are equal, with only one eigenvector, another solution is needed. (It will be \(te^{\ld t}\x\). Step 1 needs to diagonalize \(A=X\Ld X\im\): a basis of \(n\) eigenvectors.

Second Order Equations

The most important equation in meechanics is \(my\ppr+by\pr+ky=0\). This is a second-order equation because it contains the second derivative \(y\ppr=d^2y/dt^2\). It is still linear with constant coefficients \(m,b,k\).

The method of solution is to substitute \(y=e^{\ld t}\). Each derivative of \(y\) brings down a factor \(\ld\). We want \(y=e^{\ld t}\) to solve the equation:

Note

\(\dp m\frac{d^2y}{dt^2}+b\frac{dy}{dt}+ky=0\) becomes \((m\ld^2+b\ld+k)e^{\ld t}=0\).

Everything depends on \(m\ld^2+b\ld+k=0\). This equation for \(\ld\) has two roots \(\ld_1\) and \(\ld_2\). Then the equation for \(y\) has two pure solutions \(y_1=e^{\ld_1t}\) and \(y_2=e^{\ld_2t}\). Their combinations \(c_1y_1+c_2y_2\) give the complete solution unless \(\ld_1=\ld_2\).

We turn the scalar equation (with \(y\ppr\)) into a vector equation for \(y\) and \(y\pr\): first derivative only. Suppose \(m=1\). Two equations for \(\u=(y,y\pr)\) give \(d\u/dt=A\u\):

\[\begin{split}\begin{matrix}dy/dt=y\pr\\dy\pr/dt=-ky-by\pr\end{matrix}\quad\rm{converts\ to} \quad\frac{d}{dt}\bb y\\y\pr \eb=\bb 0&1\\-k&-b \eb\bb y\\y\pr \eb=A\u.\end{split}\]

The first equation \(dy/dt=y\pr\) is trivial (but true). The second is equation connecting \(y\ppr\) to \(y\pr\) and \(y\). Together they connect \(\u\pr\) to \(\u\). So we solve \(\u\pr=A\u\) by eigenvalues of \(A\):

Note

\(A-\ld I=\bb -\ld&1\\-k&-b-\ld \eb\) has determinant \(\ld^2+b\ld+k=0\).

The euqation for the \(\ld\)’s is still \(\ld^2+b\ld+k=0\), since \(m=1\). The roots \(\ld_1\) and \(\ld_2\) are now eigenvalues of \(A\). The eigenvectors and the solution are:

\[\begin{split}\x_1=\bb 1\\\ld_1 \eb\quad\x_2=\bb 1\\\ld_2 \eb\quad\u(t)=c_1e^{\ld_1t}\bb 1\\\ld_1 \eb+c_2e^{\ld_2t}\bb 1\\ld_2 \eb.\end{split}\]

The first component of \(\u(t)\) has \(y=c_1e^{\ld_1t}+c_2e^{\ld_2t}\)–the same solution as before. The vector problem is completely consistent with the scalar problem. The 2 by 2 matrix \(A\) is called a companion matrix– a companion to the second order equation with \(t\ppr\).

Difference Equations (optional)

To display a circle on a screen, replace \(y\ppr=-y\) by a difference equation. Here are three choices using \(Y(t+\Delta t)-2Y(t)+Y(t+\Delta t)\). Divide by \((\Delta t)^2\) to approximate \(y\ppr\).

Note

\(\dp\begin{matrix}F\\C\\B\end{matrix}\quad \begin{matrix}\rm{Forward\ from\ }n-1\ \ \\\rm{Centered\ at\ time\ }n\quad\ \\\rm{Backward\ from\ }n+1\end{matrix}\quad \frac{Y_{n+1}-2Y_n+Y_{n-1}}{(\Delta t)^2}= \begin{matrix}Y_{n-1}\\Y_n\ \ \ \ \\Y_{n+1}\end{matrix}\)

The three difference methods don’t complete a perfect circle in 32 time steps of length \(\Delta t=2\pi/32\). Those pictures will be explained by eigenvalues:

Forward \(|\ld|>1\) (spiral out) Centered \(|\ld|=1\) (best) Backward \(|\ld|<1\) (spiral in)

The 2-step equations reduce to 1-step systems \(\bs{U}_{n+1}=A\bs{U}_n\). Instead of \(\u=(y,y\pr)\) the discrete unknown is \(\bs{U}_n=(Y_n,Z_n)\). We take \(n\) time steps \(\Delta t\) starting from \(\bs{U}_0\):

Note

Forward: \(\begin{matrix}Y_{n+1}=Y_n+\Delta tZ_n\\ Z_{n+1}=Z_n-\Delta tY_n\end{matrix}\) becomes \(\bs{U}_{n+1}=\bb 1&\Delta t\\-\Delta t&1\eb\bb Y_n\\Z_n\eb=A\bs{U}_n\).

Those are like \(Y\pr=Z\) and \(Z\pr=-Y\). They are first order equations involving times \(n\) and \(n+1\). Eliminating \(A\) would bring back the “forward” second order equation.

Note

Eigenvales of \(A\): \(\ld=1\pm i\Delta t\). Then \(|\ld|>1\) and \((Y_n,Z_n)\) spirals out.

Backward: \(\begin{matrix}Y_{n+1}=Y_n+\Delta tZ_{n+1}\\ Z_{n+1}=Z_n-\Delta tY_{n+1}\end{matrix}\) is \(\bb 1&-\Delta t\\\Delta t&1\eb \bb Y_{n+1}\\Z_{n+1}\eb=\bb Y_n\\Z_n \eb=\bs{U}_n\).

That matrix has eigenvalues \(1+\pm i\Delta t\). But we inver it to reach \(\bs{U}_{n+1}\) from \(\bs{U}_n\). Then \(|ld|<1\) explains why the solution spirals in to :math`(0,0)` for backward differences.

The second difference \(Y_{n+1}-2Y_n+Y_{n-1}\) is the leapfrog method that “leaps over” that center value \(Y_n\) and is constantly used.

Stability of 2 by 2 Matrices

For the solution of \(d\u/dt=A\u\), there is a fundamental question. Does the solution approach \(\u=\0\) as \(t\rightarrow \infty\)? A solution that includes \(e^t\) is unstable. Stability depends on the eigenvalues of \(A\).

The complete solution \(\u(t)\) is built from pure solutions \(e^{\ld t}\x\). If the eigenvalue \(\ld\) is real, the number \(\ld\) must be negative for \(e^{\ld t}\) to approach zero. If the eigenvalue is a complex number \(\ld=r+is\), the real part \(r\) must be negavtive. When \(e^{\ld t}\) splits into \(e^{rt}e^{ist}\), the factor \(e^{ist}\) has absolute value fixed at 1:

\[e^{ist}=\cos st+i\sin st\quad\rm{has}\quad|e^{ist}|^2=\cos^2st+\sin^2st=1.\]

The real part of \(\ld\) controls the growth (\(r>0\)) or the decay (\(r<0\)).

Which matrices have negative eigenvalues? When are the real parts of the \(\ld\)‘s all negative?

Note

Stability: \(A\) is stable and \(\u(t)\rightarrow\0\) when all eigenvalues \(\ld\) have negative real parts. The 2 by 2 matrix \(A=\bb a&b\\c&d \eb\) must pass two tests:

  • \(\ld_1+\ld_2<0\): The trace \(T=a+d\) must be negative.

  • \(\ld_1\ld_2>0\): The determinant \(D=ad-bc\) must be positive.

Reason: If \(\ld\)’s are real and negative, their sum is negative. This is the trace \(T\). Their product is positive. This is the determinant \(D\). The argument also goes in the reverse direction. If \(D=\ld_1\ld_2\) is positve, then \(\ld_1\) and \(\ld_2\) have the same sign. If \(T=\ld_1+\ld_2\) is negative, that sign will be negative. We can test \(T\) and \(D\).

If the \(\ld\)’s are complex numbers, they must have the form \(r+is\) and \(r-is\). Otherwise \(T\) and \(D\) will not be real. The determinant \(D\) is automatically positive, since \((r+is)(r-is)=r^2+s^2\). The trace \(T\) is \(r+is+r-is=2r\). So a negative trace \(T\) means that the real part is negative and the matrix is stable.

The Exponential of a Matrix

We want to write the solution \(\u(t)\) in a new form \(e^{At}\u(0)\). The direct defination of \(e^x\) is by the infinite series \(1+x+\frac{1}{2}x^2+\frac{1}{6}x^3+\cds\). Change \(x\) to a square matrix \(At\) to define the matrix exponential \(e^{At}\):

Note

Matrix exponential \(e^{At}\): \(e^{At}=I+At+\frac{1}{2}(At)^2+\frac{1}{6}(At)^3+\cds\).

Its \(t\) derivative is \(Ae^{At}\): \(A+A^2t+\frac{1}{2}A^3t^2+\cds=Ae^{At}\).

Its eigenvalues are \(e^{\ld t}\): \((I+At+\frac{1}{2}(At)^2+\cds)\x=(1+\ld t+\frac{1}{2}(\ld )^2+\cds)\x\).

The number that divides \((At)^n\) is \(n!\). The factorials grow quickly. The series always converges and its derivative is always \(Ae^{At}\). Therefore \(e^{At}\u(0)\) solves the differential equation with one quick formula–even if there is a shortage of eigenvectors.

Assume \(A\) does have \(n\) independent eigenvectors, so it is diagonalizable. Substitute \(A=X\Ld X\im\) into the series for \(e^{At}\). Whenever \(X\Ld X\im X\Ld X\im\) appears, cancel \(X\im X\) in the middle:

Use the series: \(e^{At}=I+X\Ld X\im t+\frac{1}{2}(X\Ld X\im t)(X\Ld X\im t)+\cds\)

Factor out \(X\) and \(X\im\): \(e^{At}=X[I+\Ld t+\frac{1}{2}(\Ld t)^2+\cds]X\im\)

\(e^{At}\) is diagonalized: \(e^{At}=Xe^{\Ld t}X\im\).

\(e^{At}\) has the same eigenvector matrix \(X\) as \(A\). Then \(\Ld\) is a diagonal matrix and so is \(e^{\Ld t}\). The numbers \(e^{\ld_it}\) are on the diagonal. Multiply \(Xe^{\Ld t}X\im\u(0)\) to recognize \(\u(t)\):

\[\begin{split}e^{At}\u(0)=Xe^{\Ld t}X\im\u(0)=\bb \\\ \x_1&\cds&\x_n \\\ \eb \bb e^{\ld_1t}\\&\dds\\&&e^{\ld_nt} \eb\bb c_1\\\vds\\c_n \eb.\end{split}\]

The solution \(e^{At}\u(0)\) is the same answer that came in previous equation from three steps:

  1. \(\u(0)=c_1\x_1+\cds+c_n\x_n=X\bs{c}\). Here we need \(n\) independent eigenvectors.

  2. Multiply each \(\x_i\) by its growth factor \(e^{\ld_it}\) to follow it forward in time.

  3. The best form of \(e^{At}\u(0)\) is \(\u(t)=c_1e^{\ld_1t}\x_1+\cds+c_ne^{\ld_nt}\x_n\).

For an antisymmetric matrix (\(A^T=-A\)), its exponential \(e^{At}\) is an orthogonal matrix. The eigenvalues of \(A\) are \(i\) and \(-i\). The eigenvalues of \(e^{At}\) are \(e^{it}\) and \(e^{-it}\). Three rules:

  1. \(e^{At}\) always has the inverse \(e^{-At}\).

  2. The eigenvalues of \(e^{At}\) are always \(e^{\ld t}\).

  3. When \(A\) is antisymmetric, \(e^{At}\) is orthogonal. \(Inverse = transpose = e^{-At}\).

Antisymmetric is the smae as “skey-symmetric”. Those matrices have pure imaginary eigenvalues like \(i\) and \(-i\). Then \(e^{At}\) has eigenvalues like \(e^{it}\) and \(e^{-it}\). Their absolute value is 1: neutual stability, pure oscillation, energy is conserved. So \(\lv\u(t)\rv=\lv\u(0)\rv\).

Notes on a Differential Equations Course

  1. The second order equation \(mu\ppr+bu\pr+ku=0\) has major importance in applications. The exponents \(\ld\) in the solution \(u=e^{\ld t}\) solve \(m\ld^2+b\ld+k=0\). The damping coefficient \(b\) is crucial:

    Underdamping: \(b^2<4mk\); Critical damping: \(b^2=4mk\); Overdamping: \(b^2>4mk\).

    This decides whether \(\ld_2\) and \(\ld_2\) are real roots or repeated roots or complex roots. With complex \(\ld\)’s the solution \(u(t)\) oscillates as it decays.

  2. Our equation had no forcing term \(f(t)\). We were finding the “nullspace solution”. To \(\u_n(t)\) we need to add a particular solution \(u_p(t)\) that balances the force \(f(t)\):

    Input \(f(t)\) at time \(s\); Growth factor \(e^{A(t-s)}\); Add up output at time \(t\):

    \[\u_{\rm{particular}}=\int_0^t e^{A(t-s)}f(s)\ ds.\]

    This solution can also be discovered and studied by Laplace transform–that is the established way to convert linear differential equations to linear algebra.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 6.4 Symmetric Matrices

What is special about \(S\x=\ld\x\) when \(S\) is symmetric?

Note

1. A symmetric matrix has only real eigenvalues.

2. The eigenvectors can be chosen orthonormal.

Those \(n\) orthonormal eigenvectors go into the columns of \(X\). Every symmetric matrix can be diagonalized. Its eigenvector matrix \(X\) becomes an orthogonal matrix \(Q\). Orthogonal matrices have \(Q\im=Q^T\).

The eigenvectors do not have to be unit vectors. Their lengths are chosen to be unit vectors. Then \(A=X\Ld X\im\) is in its special and particular form \(S=Q\Ld Q\im\) for symmetric matrices.

Note

(Spectral Theorem): Every symmetric matrix has the factorization \(S=Q\Ld Q^T\) with real eigenvalues in \(\Ld\) and orthonormal eigenvectors in the columns of \(Q\):

  • Symmetric diagonalization: \(S=Q\Ld Q\im=Q\Ld Q^T\) with \(Q\im=Q^T\).

The spectral theorem will be proved in three steps:

  1. By an example, showing real :math`ld`’s in \(\Ld\) and orthonormal \(\x\)’s in \(Q\).

  2. By a proof of those facts when no eigenvalues are repeated.

  3. By a proof that allows repeated eigenvalues.

For \(S=\bb 1&2\\2&4 \eb\), the determinant of \(S-\ld I\) is \(\ld^2-5\ld\). The eigenvalues are 0 and 5 (both real). \(\ld=0\) is an eigenvalue because \(S\) is singular, and \(\ld=5\) matches the trace down the diagonal of \(S\): \(0+5\) agrees with \(1+4\).

Two eigenvectors are \((2,-1)\) and \((1,2)\)–orthogonal but not yet orthonormal. The eigenvector for \(\ld=0\) is in the nullspace of \(A\). The eigenvector for \(\ld=5\) is in the column space. The Fundamental Theorem says that the nullspace is perpendicular to the row space–not the column space. But our matrix is symmetric! Its row and column spaces are the same. Its eigenvectors \((2,-1)\) and \((1,2)\) must be (and are) perpendicular.

Divide them by their lengths \(\sqrt{5}\) to get unit vectors. Put those unit eigenvectors into the columns of \(Q\). Then \(Q\im SQ\) is \(\Ld\) and \(Q\im=Q^T\).

\[\begin{split}Q\im SQ=\frac{1}{\sqrt{5}}\bb 2&-1\\1&2 \eb\bb 1&2\\2&4 \eb\frac{1}{\sqrt{5}}\bb 2&1\\-1&2 \eb=\bb 0&0\\0&5 \eb=\Ld.\end{split}\]

Now comes the \(n\) by \(n\) case. The \(\ld\)’s are real when \(S=S^T\) and \(S\x=\ld\x\).

Note

Real Eigenvalues: All the eigenvalues of a real symmetric matrix are real.

Proof: Suppose that \(S\x=\ld\x\). Until we know otherwise, \(\ld\) might be a complex number \(a+ib\) (\(a\) and \(b\) real). Its complex conjugate is \(\bar{\ld}=a-ib\). Similarly the components of \(\x\) may be complex numbers, and switching the signs of their imaginary parts gives \(\bar{\x}\).

\(\bar{\ld}\) times \(\bar{\x}\) is always the conjugate of \(\ld\) times \(\x\). So we can take conjugates of \(S\x=\ld\x\), remembering that \(S\) is real:

\[S\x=\ld\x\rm{\ leads\ to\ }S\bar{\x}=\bar{\ld}\bar{\x}.\quad\rm{Transpose\ to}\quad\bar{\x}^TS=\bar{\x}^T\bar{\ld}.\]

The left sides are the same so the right sides are equal. One equation has \(\ld\), the other has \(\bar{\ld}\). They multiply \(\bar{\x}^T\x=|x_1|^2+|x_2|^2+\cds=\) length squared which is not zero. Therefore \(\ld\) must equal \(\bar{\ld}\), and \(a+ib\) equals \(a-ib\). So \(b=0\) and \(\ld=a=real\). Q.E.D.

The eigenvectors come from solving the real equation \((S-\ld I)\x=\0\). So the \(\x\)’s are also real. The important fact is that they are perpendicular.

Note

Orthogonal Eigenvectors: Eigenvectors of a real symmetric matrix (when they correspond to different \(\ld\)’s) are always perpendicular.

Proof: Suppose \(S\x=\ld_1\x\) and \(S\y=\ld_2\y\). We are assuming here that \(\ld_1\neq\ld_2\). Take dot products of the first equation with \(\y\) and the second with \(\x\):

Use \(S^T=S\):

\[(\ld_1\x)^T\y=(S\x)^T\y=\x^TS^T\y=\x^TS\y=\x^T\ld_2\y.\]

The left side is \(\x^T\ld_1\y\), the right side is \(\x^T\ld_2\y\). Since \(\ld_1\neq\ld_2\), this proves that \(\x^T\y=0\). The eigenvector \(\x\) (for \(\ld_1\)) is perpendicualr to the eigenvector \(\y\) (for \(\ld_2\)).

The eigenvectors of a 2 by 2 symmetric matrix have a special form:

Not widely known:

\[\begin{split}S=\bb a&b\\b&c \eb\rm{\ has\ }\x1+\bb b\\\ld_1-a \eb\rm{\ and\ }\x_2=\bb \ld_2-c\\b \eb.\end{split}\]

\(\x_1\) is perpendicular to \(\x_2\):

\[\x_1^T\x_2=b(\ld_2-c)+(\ld_1-a)b=b(\ld_1+\ld_2-a-c)=0.\]

This is zero because \(\ld_1+\ld_2\) equals the trace \(a+c\). Thus \(\x_1^T\x_2=0\). This also fits for the special case when \(S=I\).

Symmetric matrices \(S\) have orthogonal eigenvector matrices \(Q\):

Tip

Symmetry: \(S=X\Ld X\im\) becomes \(S=Q\Ld Q^T\) with \(Q^TQ=I\).

This says that every 2 by 2 symmetric matrix is (rotation)(stretch)(rotate back)

\[\begin{split}S=Q\Ld Q^T=\bb \\\ \q_1&\q_2 \\\ \eb\bb \ld_1\\&\ld_2 \eb\bb &\q_1^T&\\&\q_2^T \eb.\end{split}\]

Columns \(\q_1\) and \(\q_2\) multiply rows \(\ld_1\q_1^T\) and \(\ld_2\q_2^T\) to produce \(S=\ld_1\q_1\q_1^T+\ld_2\q_2\q_2^T\).

Tip

Every symmetric matrix: \(S=Q\Ld Q^T=\ld_1\q_1\q_1^T+\cds+\ld_n\q_n\q_n^T\).

Note

\(S\) has correct eigenvectors; Those \(q\)‘s are orthonormal:

  • \(S\q_i=(\ld_1\q_1\q_1^T+\cds+\ld_n\q_n\q_n^T)\q_i=\ld_i\q_i\).

Complex Eigenvalues of Real Matrices

Note

For real matrices, complex \(\ld\)‘s and \(\x\)‘s come in “conjugate pairs.”

  • \(\begin{matrix}\ld=a+ib\\\bar{\ld}=a-ib\end{matrix}\quad\) If \(A\x=\ld\x\) then \(A\bar{\x}=\bar{\ld}{\x}\).

\(A=\bb \cos\theta&-\sin\theta\\\sin\theta&\cos\theta\eb\) has \(\ld_1=\cos\theta+i\sin\theta\) and \(\ld_2=\cos\theta-i\sin\theta\).

Those eigenvalues are conjugate to each other. They are \(\ld\) and \(\bar{\ld}\). The eigenvectors must by \(\x\) and \(\bar{\x}\), because \(A\) is real:

  • This is \(\ld\x\) \(A\x=\bb \cos\theta&-\sin\theta\\ \sin\theta&\cos\theta\eb\bb 1\\-i \eb=(\cos\theta+i\sin\theta)\bb 1\\-i\eb\).

  • This is \(\ld\bar{\x}\) \(A\x=\bb \cos\theta&-\sin\theta\\ \sin\theta&\cos\theta\eb\bb 1\\i \eb=(\cos\theta-i\sin\theta)\bb 1\\i\eb\).

For this rotation matrix the absolute value is \(|\ld|=1\), because \(\cos^2\theta+\sin^2\theta=1\). This fact \(|\ld|=1\) holds for the eigenvalues of every orthogonal matrix \(Q\).

Eigenvalues versus Pivots

The only connection between eigenvalues and pivots is:

product of pivots = determinant = product of eigenvalues.

We are assuming a full set of pivots \(d_1,\cds,d_n\). There are \(n\) real eigenvalues \(\ld_1,\cds,\ld_n\). The \(d\)’s and \(\ld\)’s are not the same, but they come from the same symmetric matrix. For symmetric matrices the pivots and the eigenvalues have the same signs:

Tip

The number of positive eigenvalues of \(S=S^T\) equals the number of positive pivots.

Special case: \(S\) have all \(\ld_i>0\) if and only if all pivots are positive.

That special case is an all-important fact for positive definite matrices.

When the pivots are divided out of the rows of \(U\), then \(S=LDL^T\). The diagonal pivot matrix \(D\) goes between the triangular matrices \(L\) and \(L^T\).

Tip

Watch the eigenvalues of \(LDL^T\) when \(L\) moves to \(I\). \(S\) changes to \(D\).

Move \(L\) toward \(I\), by moving the off-diagonal entries to zero. The pivots are not changing and not zero. The eigenvalues \(\ld\) of \(LDL^T\) changes to the eigenvalues \(d\) of \(IDI^T\). Since these eigenvalues cannot cross zero as they move into the pivots, their signs cannot change. Same signs for the \(\ld\)’s and \(d\)’s.

This connect the two halves of applied linear algebra–pivots and eigenvalues.

All Symmetric Matrices are Diagonalizable

When no eigenvalues of \(A\) are repeated, the eigenvectors are sure to be independent. Then \(A\) can be diagonalized. But a repeated eigenvalue can produce a shortage of eigenvectors. This sometimes happens for nonsymmetric matrices. It never happens for symmetric matrices. There are always enough eigenvectors to diagonalize \(S=S^T\).

One idea is to change \(S\) slightly by a diagonal matrix \(\bs{\rm{diag}}(c,2c,\cds,nc)\). If \(c\) is very small, the new symmetric matrix will have no repeated eigenvalues. Then we know it has afull set of orthonormal eigenvectors. As \(c\rightarrow 0\) we obtain \(n\) orthonormal eigenvectors of the original \(S\)–even if some eigenvalues of that \(S\) are repeated.

Schur’s Theorem: Every square \(A\) factors into \(QTQ\im\) where \(T\) is upper triangular and \(Q^T=Q\im\). If \(A\) has real eigenvalues then \(Q\) and \(T\) can be chosen real: \(Q^TQ=I\).

Singular values come from \(A^TA\) and \(AA^T\).

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 6.5 Positive Definite Matrices

Symmetric matrices with all positive eigenvalues are called positive definite.

Here are two goals of this section:

  • To find quick tests on a symmetric matrix that guarantee positive eigenvalues.

  • To explain important applications of positive definiteness.

Every eigenvalue is real because the matrix is symmetric.

Start with 2 by 2. When does \(S=\bb a&b\\b&c \eb\) have \(\ld_1>0\) and \(\ld_2>0\)?

Note

Test: The eigenvalues of \(S\) are positive if and only if \(a > 0\) and \(ac-b^2>0\).

Proof that the 2 by 2 test is passed when \(\ld_1>0\) and \(\ld_2>0\). Their product \(\ld_1\ld_2\) is the determinant so \(ac-b^2>0\). Their sum \(\ld_1+\ld_2\) is the trace so \(a+c>0\). Then \(a\) and \(c\) are both positive (if \(a\) or \(c\) is not positive, \(ac-b^2>0\) will fail).

This test uses the 1 by 1 determinant \(a\) and the 2 by 2 determinant \(ac-b^2\). When \(S\) is 3 by 3, \(\det S>0\) is the third part of the test. The next test requires postive pivots.

Note

Test: The eigenvalues of \(S\) are positive if and only if the pivots are positive:

  • \(a>0\) and \(\dp\frac{ac-b^2}{a}>0\xrightarrow[\text{world}]{\text{hello}}\).

\[\begin{split}\bb a&b\\b&c \eb \xrightarrow[\rm{The\ multiplier\ is\ }b/a] {\rm{The\ first\ pivot\ is\ }a}\bb a&b\\0&c-\frac{b}{a}b \eb.\end{split}\]

This connects two big parts of linear algebra. Positive eigenvalues mean positive pivots and vice versa. Each pivot is a ratio of upper left determinant. The pivots give a quick test for \(\ld>0\), and they are a lot faster to compute than the eigenvalues.

Energy-based Definition

From \(S\x=\ld\x\), nultiply by \(\x^T\) to get \(\x^TS\x=\ld\x^T\x\). The right side is a positive \(\ld\) times a positive number \(\x^T\x=\lv\x\rv^2\). So the left side \(\x^TS\x\) is positive for any eigenvector.

Important point: The new idea is that \(\x^TS\x\) is positive for all nonzero vectors \(\x\), not just the eigenvectors. In many application this number \(\x^TS\x\) (or \(\frac{1}{2}\x^TS\x\) ) is the energy in the system. The requirement of positive energy gives another definition of a positive definite matrix.

Eigenvalues and pivots are two equivalent ways to test the new requirement \(\x^TS\x>0\).

Note

Definition: \(S\) is positive definite if \(\x^TS\x>0\) for every nonzero vector \(\x\):

  • 2 by 2: \(\x^TS\x=\bb x&y \eb\bb a&b\\b&c \eb\bb x\\y \eb=ax^2+2bxy+cy^2>0\).

The four entries \(a,b,b,c\) give the four parts of \(\x^TS\x\). From \(a\) and \(c\) come the pure squares \(ax^2\) and \(xy^2\). From \(b\) and \(b\) off the diagonal come the cross terms \(bxy\) and \(byx\) (the same). Adding those four parts gives \(\x^TS\x\). This energy-based definition leads to a basic fact:

Tip

If \(S\) and \(T\) are symmetric positive deffinite, so is \(S+T\).

Reason, \(\x^T(S+T)\x\) is simply \(\x^TS\x+\x^T\x\). Those two terms are positive (for \(\x\neq\0\)) so \(S+T\) is also positive definite.

\(\x^TS\x\) also connects with our final way to recognize a positive definite matrix. For any matrix \(A\), possibly rectangular, we know that \(S=A^TA\) is square and symmetric.

Test: If the columns of \(A\) are independent, then \(S=A^TA\) is positive definite.

\(\x^TS\x=\x^TA^TA\x=(A\x)^T(A\x)=\lv A\x \rv^2\). The vector \(A\x\) is not zero when \(\x\neq\0\) (this is the meaning of independent columns). Then \(\x^TS\x\) is the positive number \(\lv A\x \rv^2\) and the matrix \(S\) is positive definite.

Note

When a symmetric matrix \(S\) has one of these five properties, it has them all:

  1. All \(n\) pivots of \(S\) are positive.

  2. All \(n\) upper left determinants are positive.

  3. All \(n\) eigenvalues of \(S\) are positive.

  4. \(\x^TS\x\) is positive except at \(\x=\0\). This is the energy-based definition.

  5. \(S\) eequals \(A^TA\) for a matrix \(A\) with independent columns.

Positive Semidefinite Matrices

Often we are at the edge of positive definiteness. The determinant is zero. The smallest eigenvalue is zero. The energy in its eigenvector is \(\x^TS\x=\x^T0\x=0\). These matrices on the edge are called positive semidefinite.

The matrix \(S=\bb 1&2\\2&4 \eb\) factors into \(A^TA\) with dependent columns in \(A\):

Dependent columns in \(A\); Positive semidefinite \(S\):

\[\begin{split}\bb 1&2\\2&4 \eb=\bb 1&0\\2&0 \eb\bb 1&2\\0&0 \eb=A^TA.\end{split}\]

If 4 is increased by any small number, the matrix \(S\) will become positive definite.

Positive semidefinite matrices have all \(\ld\geq 0\) and all \(\x^TS\x\geq 0\). Those weak inequalities (\(\geq\) instead of \(>\)) include positive definite \(S\) and also the singular matrices at the edge.

The Ellipse \(ax^2+2bxy+xy^2=1\)

Think of a tilted ellipse \(\x^TS\x=1\). Its center is \((0,0)\). Turn it to line up with the coordinate axes (\(X\) and \(Y\) axes). These two pictures show the geometry behind the factorization \(S=Q\Ld Q\im=Q\Ld Q^T\):

  1. The tilted ellipse is associated with \(S\). Its equation is \(\x^TS\x=1\).

  2. The lined-up ellipse is associated with \(\Ld\). Its equation is \(\bs{X}^T\Ld\bs{X}=1\).

  3. The rotation matrix that lines up the ellipse is the eigenvector matrix \(Q\).

The axes of the tilted ellipse point along those eigenvectors. This explains why \(S=Q\Ld Q^T\) is called the “principal axis theorem”–it displays the axes. Not only the axis directions (from the eigenvectors) but also the axis lengths (from the eigenvalues). Notice: The bigger eigenvalue \(\ld_1\) gives the shorter axis.

In the \(xy\) system, the axes are along the eigenvectors of \(S\). In the \(XY\) system, the axes are along the eigenvectors of \(\Ld\)–the coordinate axes. All comes from \(S=Q\Ld Q^T\).

Note

\(S=Q\Ld Q^T\) is positive definite when all \(\ld_i>0\). The graph of \(\x^TS\x=1\) is an ellipse:

  • \(\bb x&y \eb Q\Ld Q^T\bb x\\y \eb=\bb X&Y \eb\Ld\bb X\\Y \eb=\ld_1X^2+\ld_2Y^2=1\).

The axes point along eigenvectors of \(S\). The half-lengths are \(1/\sqrt{\ld_1}\) and \(1/\sqrt{\ld_2}\).

\(S=I\) gives the circle \(x^2+y^2=1\). If one eigenvalue is negative, the ellipse changes to a hyperbola8. The sum of squares becomes a *difference of squares. For a negative definite matrix like \(S=-I\), with both \(\ld\)’s negative, the graph of \(-x^2-y^2=1\) has no points at all.

If \(S\) is \(n\) by \(n\), \(\x^TS\x=1\) is an “ellipsoid” in \(\R^n\). Its axes are the eigenvectors of \(S\).

Important Application: Test for a Minimum

For \(f(x)\), the test for a minimum comes from calculus: \(df/dx\) is zero and \(d^2f/dx^2>0\). Two variables in \(F(x,y)\) produce a symmetric matrix \(S\). It contains four second derivatives. Positive \(d^2f/dx^2\) changes to positive definite \(S\):

Second derivatives:

\[\begin{split}S=\bb \pd^2F/\pd x^2&\pd^2F/\pd x\pd y\\\pd^2F/\pd y\pd x&\pd^2F/\pd y^2 \eb.\end{split}\]

\(F(x,y)\) has a minimum if \(\pd F/\pd x=\pd F/\pd y=0\) and \(S\) is positive definite.

Reason: \(S\) reveals the all-important terms \(ax^2+2bxy+cy^2\) near \((x,y)=(0,0)\). The second derivatives of \(F\) are \(2a,2b,2b,2c\). For \(F(x,y,z)\) the matrix \(S\) will be 3 by 3.

Table of Eigenvalues and Eigenvectors

Symmetric: \(S^T=S=Q\Ld Q^T\)

real eigenvalues

orthogonal \(\x_i^T\x_j=0\)

Orthogonal: \(Q^T=Q\im\)

all \(|\ld|=1\)

orthogonal \(\bar{\x}_i^T\x_j=0\)

Skew-symmetric: \(A^T=-A\)

imaginary \(\ld\)’s

orthogonal \(\bar{\x}_i^T\x_j=0\)

Complex Hermitian: \(\bar{S}^T=S\)

real \(\ld\)’s

orthogonal \(\bar{\x}_i^T\x_j=0\)

Positive Definite: \(\x^TS\x>0\)

all \(\ld>0\)

orthogonal since \(S^T=S\)

Markov: \(m_{ij}>0, \sum_{i=1}^nm_{ij}=1\)

\(\ld_{\rm{max}}=1\)

steady state \(\x>0\)

Similar: \(A=BCB\im\)

\(\ld(A)=\ld(C)\)

\(B\) times eigenvector of \(C\)

Projection: \(P=P^2=P^T\)

\(\ld=1;0\)

column space; nullspace

Plane Rotation: cosine-sine

\(e^{i\th}\) and \(e^{-i\th}\)

\(\x=(1,i)\) and \((1,-i)\)

Reflection: \(I-2\u\u^T\)

\(\ld=-1;1,\cds,1\)

\(\u\); whole plane \(\u^{\perp}\)

Rank One: \(\u\v^T\)

\(\ld=\v^T\u;0,\cds,0\)

\(\u\); whole plane \(\v^{\perp}\)

Inverse: \(A\im\)

\(1/\ld(A)\)

keep eigenvectors of \(A\)

Shift: \(A+cI\)

\(\ld(A)+c\)

keep eigenvectors of \(A\)

Stable Powers: \(A^n\rightarrow 0\)

all \(|\ld|<1\)

any eigenvectors

Stable Exponential: \(e^{At}\rightarrow 0\)

all \(\Re \ld<0\)

any eigenvectors

Cyclic Permutation: \(P_{i,i+1}=1;P_{n1}=1\)

\(\ld_k=e^{2\pi ik/n}=\) roots of 1

\(\x_k=(1,\ld_k,\cds,\ld_k^{n-1})\)

Circulant: \(c_0I+c_1I+\cds\)

\(\ld_k=c_0+c_1e^{2\pi ik/n}+\cds\)

\(\x_k=(1,\ld_k,\cds,\ld_k^{n-1})\)

Tridiagonal: \(-1,2,-1\) on diagonals

\(\ld_k=2-2\cos\frac{k\pi}{n+1}\)

\(x_k=(\sin\frac{k\pi}{n+1},\sin\frac{2k\pi}{n+1},\cds)\)

Diagonalizable: \(A=X\Ld X\im\)

diagonal of \(\Ld\)

columns of \(X\) are independent

Schur: \(A=QTQ\im\)

diagonal of triangular \(T\)

columns of \(Q\) if \(A^TA=AA^T\)

Jordan: \(A=BJB\im\)

digonal of \(J\)

each block gives 1 eigenvector

SVD: \(A=U\Sg V^T\)

\(r\) singular values in \(\Sg\)

eigenvectors of \(A^TA, AA^T\) in \(V,U\)

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 7 The Singular Value Decomposition (SVD)

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 7.1 Image Processing by Linear Algebra

The singular value theorem for \(A\) is the eigenvalue theorem for \(A^TA\) and \(AA^T\).

\(A\) has two sets of singular vectors (the eigenvectors of \(A^TA\) and \(AA^T\)). There is one set of positive singular values (because \(A^TA\) has the same positive eigenvalues as \(AA^T\)). \(A\) is often rectangular, but \(A^TA\) and \(AA^T\) are square, symmetric, and positive semidefinite.

The Singular Value Decomposition (SVD) separates any matrix into simple pieces.

Each piece is a column vector times a row vector. An \(m\) by \(n\) matrix has \(m\) times \(n\) entries (a big number when the matrix represents an image). But a colkumn and a row only have \(m+n\) components, far less than \(m\) times \(n\).

Think of an image as a large rectangular matrix. The entries \(a_{ij}\) tell the grayscales of all the pixels in the image. Major success in compression will be impossible if every \(a_{ij}\) is an independent random number. We totally depend on the fact that nearby pixels generally have similar grayscales. An edge produces a sudden jump when you cross over it. Cartoons are more compressible than real-world images, with edges everywhere.

For a video, the number \(a_{ij}\) don’t change much between frames. We only transmit the small changes. This is difference coding in the H.264 video compression standard. We compress each change matrix by linear algebra (and by nonlinear “quantization” for an efficient step to integers in the computer).

Low Rank Images (Examples)

For a matrix \(A\) has the same grayscale \(g\) in every entry: \(d_{ij}=g\). When \(g=1\) and \(m=n=6\), here is an extreme example of the central SVD dogma of image processing:

\[\begin{split}\rm{Don't\ send\ }A=\bb 1&1&1&1&1&1\\1&1&1&1&1&1\\1&1&1&1&1&1\\1&1&1&1&1&1\\1&1&1&1&1&1\\1&1&1&1&1&1 \eb\quad \rm{Send\ }A=\bb 1\\1\\1\\1\\1\\1 \eb\bb 1&1&1&1&1&1 \eb.\end{split}\]

If we define the all-ones vector \(\x\) in advance, we only have to send one number. That number would be the constant grayscale \(g\) that multiplies \(\x\x^T\) to produce the matrix.

If there are special vectors like \(\x=\) ones that can usefully be defined in advace, then image processing can be extremely fast. The battle is between preselected bases (the Fourier basis allows speed-up for the FFT) and adaptive bases determined by the image. The SVD produces bases from the image itself–this is adaptive and it can be expensive.

\[\begin{split}\rm{Don't\ send\ }A=\bb a&a&c&c&e&e\\a&a&c&c&e&e\\a&a&c&c&e&e\\a&a&c&c&e&e\\a&a&c&c&e&e\\a&a&c&c&e&e \eb\quad \rm{Send\ }A=\bb 1\\1\\1\\1\\1\\1 \eb\bb a&a&c&c&e&e \eb.\end{split}\]

This matrix has 3 values but it still has rank 1. We still have one column times one row. But when the rank moves up to \(r=2\), we need \(\u_1\v_1^T+\u_2\v_2^T\). Here is one choice:

Note

Embedded square: \(\bb 1&0\\1&1 \eb\) is equal to \(A=\bb 1\\1 \eb\bb 1&1 \eb-\bb 1\\0 \eb\bb 0&1 \eb\).

The SVD chooses rank one pieces in order of importance. If the rank of \(A\) is much higher than 2, as we expect for real images, then \(A\) will add up many rank one pieces. We want the small ones to be really small–they can be discarded with no loss to visual quality. Image compression becomes lossy, but good image compression is virtually undetectable by the human visual system.

Eigenvectors for the SVD

Use the eigenvectors \(\u\) of \(AA^T\) and the eigenvectors \(\v\) of \(A^TA\).

Since \(AA^T\) and \(A^TA\) are automatically symmetric (but not usually equal!) the \(\u\)’s will be one orthogonal set and the eigenvectors \(\v\) will be another orthogonal set. We can and will make them all unit vectors: \(\lv\u_i\rv=1\) and \(\lv\v_i\rv=1\). Then our rank 2 matrix will be \(A=\sg_1\u_1\v_1^T+\sg_2\u_2\v_2^T\). The size of those numbers \(\sg_1\) and \(\sg_2\) will decide whether they can be ignored in compression. We keep larger \(\sg\)’s, we discard small \(\sg\)’s.

The \(\u\)’s from the SVD are called left singular vectors (unit eigenvectors of \(AA^T\)). The \(\v\)’s are right singular vectors (unit eigenvectors of \(A^TA\)). The \(\sg\)’s are singular values, square roots of the equal eigenvalues of \(AA^T\) and \(A^TA\):

Note

Choices from the SVD:

  • \(AA^T\u_i=\sg_i^2\u_i \quad A^TA\v_i=\sg_i^2\v_i \quad A\v_i=\sg_i\u_i\).

Note

\(A=\bb \\\ \u_1&\u_2 \\\ \eb\bb \sg_1\\&\sg_2 \eb\bb \v_1^T\\\v_2^T \eb\) or more simply \(A\bb \\\ \v_1&\v_2 \\\ \eb=\bb \\\ \sg_1\u_1&\sg_2\u_2 \\\ \eb\)

Important: The key point is not that images tend to have low rank. No: Images mostly have full rank. But they do have low effective rank. This means: Many singular values are small and can be set to zero. We transmit a low rank approximation.

Visual quality can be preserved even with a big reduction in the rank.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 7.2 Bases and Matrices in the SVD

\(A\) is any \(m\) by \(n\) matrix, square or rectangular. Its rank is \(r\). We will daigonalize this \(A\), but not by \(X\im AX\). The eigenvectors in \(X\) have three big problems: They are usually not orthogonal, there are not always enough eigenvectors, and \(A\x=\ld\x\) requires \(A\) to be a square matrix. The singular vectors of \(A\) solve all those problems in a perfect way.

We want from the SVD are the right bases for the four subspaces. The steps to find those basis vectors will be described in order of importance.

The price we pay is to have two sets of singular vectors, \(\u\)’s and \(\v\)’s. The \(\u\)’s are in \(\R^m\) and the \(\v\)’s are in \(\R^n\). They will be the columns of an \(m\) by \(m\) matrix \(\bs{U}\) and an \(n\) by \(n\) matrix \(\bs{V}\).

Using vectors: The \(\u\)’s and \(\v\)’s give bases for the four fundamental subspaces:

Note

  • \(\u_1,\cds,\u_r\) is an orthonormal basis for the column space

  • \(\u_{r+1},\cds,\u_m\) is an orthonormal basis for the left nullspace \(\N(A^T)\)

  • \(\v_1,\cds,\v_r\) is an orthonormal basis for the row space

  • \(\v_{r+1},\cds,\v_n\) is an orthonormal basis for the nullspace \(\N(A)\).

More than just orthogonality, these basis vectors diagonalize the matrix \(A\):

Note

\(A\) is diagonalized”: \(A\v_1=\sg_1\u_1\quad A\v_2=\sg_2\u_2\quad\cds\quad A\v_r=\sg_r\u_r\).

Those singular values \(\sg_1\) to \(\sg_r\) will be positive numbers: \(\sg_i\) is the length of \(A\v_i\). The \(\sg\)’s go into a diagonal matrix that is otherwise zero. That matrix is \(\Sg\).

Using matrices: Since the \(\u\)’s are orthonormal, the matrix \(U_r\) with those \(r\) columns has \(U_r^TU_r=I\). Since the \(\v\)’s are orthonormal, the matrix \(V_r\) has \(V_r^TV_r=I\). Then the equations \(A\v_i=\sg_i\u_i\) tell us column by column that \(AV_r=U_r\Sg_r\):

\[\begin{split}AV_r=U_r\Sg_r\quad A\bb \\\ \v_1&\cds&v_r \\\ \eb=\bb \\\ \u_1&\cds&\u_r \\\ \eb\bb \sg_1\\&\dds\\&&\sg_r \eb.\end{split}\]

Those \(\v\)’s and \(\u\)’s account for the row space and column space of \(A\). We have \(n-r\) more \(\v\)’s and \(m-r\) more \(\u\)’s, from the nullspace \(\N(A)\) and the left nullspace \(\N(A^T)\). They are automatically orthogonal to the first \(\v\)’s and \(\u\)’s (because the whole nullspace are orthogonal). We now include all the \(\v\)’s and \(\u\)’s in \(V\) and \(U\), so these matrices become square. We still have \(AV=U\Sg\).

\[\begin{split}AV=U\Sg\quad A\bb \\\ \v_1\ \cds\ \v_r\ \cds\ \v_n \\\ \eb= \bb \\\ \u_1\ \cds\ \u_r\ \cds\ \u_m \\\ \eb\bb \sg_1\\&\dds\\&&\sg_r&& \\\ \eb.\end{split}\]

The new \(\Sg\) is \(m\) by \(n\). It is just the \(r\) by \(r\) matrix with \(m-r\) extra zero rows and \(n-r\) new zero columns. The real change is in the shapes of \(U\) and \(V\). Those are square matrices and \(V\im=V^T\). So \(AV=U\Sg\) becomes \(A=U\Sg V^T\). This is the Singular Value Decomposition. I can multiply columns \(\u_i\sg_i\) from \(U\Sg\) by rows of \(V^T\):

Note

SVD: \(A=U\Sg V^T=\u_1\sg_1\v_1^T+\cds+\u_r\sg_r\v_r^T\).

We will see that each \(\sg_i^2\) is an eigenvalue of \(A^TA\) and also \(AA^T\). When we put the singular values in descending order, \(\sg_1\geq\sg_2\geq\cds\sg_r>0\), the splitting gives the \(r\) rank-one pieces of \(A\) in order of importance.

When is \(A=U\Sg V^T\) (singular values) the same as \(X\Ld X\im\) (eigenvalues)?

\(A\) needs orthonormal eigenvalues to allow \(X=U=V\). \(A\) also needs eigenvalues \(\ld\geq 0\) if \(\Ld=\Sg\). So \(A\) must be a positive semidefinite (or difinite) symmetric matrix. Only then will \(A=X\Ld X\im\) which is also \(Q\Ld Q^T\) coincide with \(A=U\Ld V^T\).

Proof of the SVD

We need to show how those \(\u\)’s and \(\v\)’s can be constructed. The \(\v\)’s will be orthonormal eigenvectors of \(A^TA\). This must be true because we are aiming for

\[A^TA=(U\Sg V^T)^T(U\Sg V)^T=V\Sg^TU^TU\Sg V^T=V\Sg^T\Sg V^T.\]

On the right you see the eigenvector matrix \(V\) for the symmetric positive (semi) definite matrix \(A^TA\). And (\(\Sg^T\Sg\)) must be the eigenvalue matrix of (\(A^TA\)): Each \(\sg^2\) is \(\ld(A^TA)\).

Now \(A\v_i=\sg_i\u_i\) tells us the unit vectors \(\u_1\) to \(\u_r\). This is the key equation. The essential point–the whole reason that the SVD succeeds–is that those unit vectors \(\u_1\) to \(\u_r\) are automatically orthogonal to each other (because the \(\v\)’s are orthogonal):

Key step \(i\neq j\):

\[\u_i^T\u_j=\left(\frac{A\v_i}{\sg_i}\right)^T\left(\frac{A\v_j}{\sg_j}\right) =\frac{\v_i^TA^TA\v_j}{\sg_i\sg_j}=\frac{\sg_j^2}{\sg_i\sg_j}\v_i^T\v_j =\bs{\rm{zero}}.\]

The \(\v\)’s are eigenvectors of \(A^TA\) (symmetric). They are orthogonal and now the \(\u\)’s are also orthogonal. Actually those \(\u\)’s will be eigenvectors of \(AA^T\).

Finally we complete the \(\v\)’s and \(\u\)’s to \(n\) \(\v\)’s and \(m\) \(\u\)’s with any orthonormal bases for the nullspace \(\N(A)\) and \(\N(A^T)\). We have found \(V\) and \(\Sg\) and \(U\) in \(A=U\Sg V^T\).

An Example of the SVD

For a rank 2 matrix \(A=\bb 3&0\\4&5 \eb\):

\[\begin{split}A^TA=\bb 25&20\\20&25 \eb\quad AA^T=\bb 9&12\\12&41 \eb.\end{split}\]

Those have the same trace 50 and the same eigenvalues \(\sg_1^2=45\) and \(\sg_2^2=5\). The square roots are \(\sg_1=\sqrt{45}\) and \(\sg_2=\sqrt{5}\). Then \(\sg_1\sg_2=15\) and this is t he determinant of \(A\).

Right singular vectors:

\[\begin{split}\v_1=\frac{1}{\sqrt{2}}\bb 1\\1 \eb\quad\v_2=\frac{1}{\sqrt{2}}\bb -1\\1 \eb.\end{split}\]

Left singular vectors:

\[\u_i=\frac{A\v_i}{\sg_i}.\]

Now compute \(A\v_1\) and \(A\v_2\) which will be \(\sg_1\u_1=\sqrt{45}\u_1\) and \(\sg_2\u_2=\sqrt{5}\u_2\):

\[ \begin{align}\begin{aligned}\begin{split}A\v_1=\frac{3}{\sqrt{2}}\bb 1\\3 \eb=\sqrt{45}\frac{1}{\sqrt{10}}\bb 1\\3 \eb=\sg_1\u_1\end{split}\\\begin{split}A\v_2=\frac{1}{\sqrt{2}}\bb -3\\1 \eb=\sqrt{5}\frac{1}{\sqrt{10}}\bb -3\\1 \eb=\sg_2\u_2\end{split}\end{aligned}\end{align} \]

Note

\(\dp U=\frac{1}{\sqrt{10}}\bb 1&-3\\3&1 \eb\quad \Sg=\bb \sqrt{45}\\&\sqrt{5} \eb\quad V=\frac{1}{\sqrt{2}}\bb 1&-1\\1&1 \eb\).

\(U\) and \(V\) contain orthonormal bases for the column space and the row space (both spaces are just \(\R^2\)). The matrix \(A\) splits into a combination of two rank-one matrices, columns times rows:

\[\begin{split}\sg_1\u_1\v_1^T+\sg_2\u_2\v_2^T=\frac{\sqrt{45}}{\sqrt{20}}\bb 1&1\\3&3 \eb+ \frac{\sqrt{5}}{\sqrt{20}}\bb 3&-3\\-1&1 \eb=\bb 3&0\\4&5 \eb=A.\end{split}\]

An Extreme Matrix

The matrix \(A\) is badly lopsided (strictly triangular). All its eigenvalues are zero with the only eigenvector \((1,0,0,0)\). The singular values are \(\sg=3,2,1\) and singular vectors are columns of \(I\):

\[\begin{split}A=\bb 0&1&0&0\\0&0&2&0\\0&0&0&3\\0&0&0&0 \eb.\end{split}\]

\(A^TA\) and \(AA^T\) are diagonal:

\[\begin{split}A^TA=\bb 0&0&0&0\\0&1&0&0\\0&0&4&0\\0&0&0&9 \eb \quad AA^T=\bb 1&0&0&0\\0&4&0&0\\0&0&9&0\\0&0&0&0 \eb.\end{split}\]

The eigenvectors (\(\u\)’s for \(AA^T\) and \(\v\)’s for \(A^TA\)) go in decreasing order \(\sg_1^2>\sg_2^2>\sg_3^2\) of the eigenvalues. Those eigenvalues are \(\sg^2=9,4,1\).

\[\begin{split}U=\bb 0&0&1&0\\0&1&0&0\\1&0&0&0\\0&0&0&1\eb\quad\Sg=\bb 3\\&2\\&&1\\&&&0 \eb \quad V=\bb 0&0&0&1\\0&0&1&0\\0&1&0&0\\1&0&0&0 \eb.\end{split}\]

Note

\(A=U\Sg V^T=3\u_1\v_1^T+2\u_2\v_2^T+1\u_3\v_3^T\).

Note: Removing the zero row of \(A\) (now \(3\times 4\)) just removes the last row of \(\Sg\) and also the last row and column of \(U\). Then \((3\times 4)=U\Sg V^T=(3\times 3)(3\times 4)(4\times 4)\). The SVD is totally adapted to rectangular matrices.

Sigular Value Stability versus Eigenvalue Instability

The singular values of any matrix are stable.

Singular Vectors of \(A\) and Eigenvectors of \(S=A^TA\)

We have proved the SVD all at once. The singular vectors \(\v_i\) are the eigenvectors \(\q_i\) of \(S=A^TA\). The eigenvalues \(\ld_i\) of \(S\) are the same as \(\sg_i^2\) for \(A\). The rank \(r\) of \(S\) equals the rank of \(A\). The expansions in eigenvectors and singular vectors are perfectly parallel.

Note

  • Symmetric \(S\): \(S=Q\Ld Q^T=\ld_1\q_1\q_1^T+\ld_2\q_2\q_2^T+\cds+\ld_r\q_r\q_r^T\).

  • Any matrix \(A\): \(A=U\Sg V^T=\sg_1\u_1\v_1^T+\sg_2\u_2\v_2^T+\cds+\sg_r\u_r\v_r^T\).

The \(\q\)’s are orthonormal, the \(\u\)’s are orthonormal, the \(\v\)’s are orthonormal.

If \(\ld\) is a double eigenvalue of \(S\), we can and must find two orthonormal eigenvectors. We want to understand the eigenvalues \(\ld\) (of \(S\)) and the singular values \(\sg\) (of \(A\)) one at a time instead of all at once.

Start with the larget eigenvalue \(\ld_1\) of \(S\). It solves this problem”

\(\dp\ld_1=\rm{maximum\ ratio\ }\frac{\x^TS\x}{\x^T\x}\). The winning vector is \(\x_1=\q_1\) with \(S\q_1=\ld_1\q_1\).

Compare with the largest singular value \(\sg_1\) of \(A\). It solves this problem:

\(\dp\sg_1=\rm{maximum\ ratio\ }\frac{\lv A\x \rv}{\lv\x\rv}\). The winning vector is \(\x=\v_1\) with \(A\v_1=\sg_1\u_1\).

This “one at a time approach” applies also to \(\ld_2\) and \(\sg_2\). But not all \(\x\)’s are allowed:

\(\dp\ld_2=\rm{maximum\ ratio\ }\frac{\x^TS\x}{\x^T\x}\) among all \(\x\)’s with \(\q_A^T\x=0\). \(\x=\q_2\) will win.

\(\dp\sg_2=\rm{maximum\ ratio\ }\frac{\lv A\x \rv}{\lv\x\rv}\) among all \(\x\)’s with \(\v_1^T\x=0\). \(\x=\v_2\) will win.

When \(S=A^TA\) we find \(\ld_1=\sg_1^2\) and \(\ld_2=\sg_2^2\).

Start with the ratio \(r(\x)=\x^TS\x/\x^T\x\). This is called the Rayleigh quotient. To maximize \(r(\x)\), set its partial derivatives to zero: \(\pd r/\pd x_i=0\) for \(i=1,\cds,n\). Those derivatives are messy and here is the result: one vector equation for the winning \(\x\):

\[\rm{The\ derivatives\ of\ }r(\x)=\frac{\x^TS\x}{\x^T\x}\rm{\ are\ zero\ when\ }S\x=r(\x)\x\]

So the winning \(\x\) is an eigenvector of \(S\). The maximum ratio \(r(\x)\) is the largest eigenvalue \(\ld_1\) of \(S\). Notice the connection to \(S=A^TA\):

\[\rm{Maximizing\ }\frac{\lv A\x \rv}{\lv\x\rv}\rm{\ also\ maximizes\ }\left( \frac{\lv A\x \rv}{\lv\x\rv}\right)^2=\frac{\x^TA^TA\x}{\x^T\x}= \frac{\x^TS\x}{\x^T\x}\]

So the winning \(\x=\v_1\) is the same as the top eigenvector \(\q_1\) of \(S=A^TA\).

Now we explain why \(\q_1\) and \(v_2\) are the winning vectors.

Start with any orthogoonal matrix \(Q_1\) that has \(\q_1\) in its first column. The other \(n-1\) orthonormal columns just have to be orthogonal to \(\q_1\). Then use \(S\q_1=\ld_1\q_1\):

\[\begin{split}SQ_1=S\bb \q_1\ \q_2\ \cds\ \q_n \eb=\bb \q_1\ \q_2\ \cds\ \q_n \eb \bb \ld_1&\w_T\\\0&S_{n-1} \eb=Q_1\bb \ld_1&\w^T\\\0&S_{n-1} \eb.\end{split}\]

Multiply by \(Q_1^T\), remember \(Q_1^TQ_1=I\), and recognize that \(Q_1^TSQ_1\) is symmetric like \(S\):

\[\begin{split}\rm{The\ symmetry\ of\ }Q_1^TSQ_1=\bb \ld_1&\w^T\\\0&S_{n-1} \eb \rm{\ forces\ }\w=\0\rm{\ and\ }S_{n-1}^T=S_{n-1}.\end{split}\]

The requirement \(\q_1^T\x=0\) has reduced the maximum problem to size \(n-1\). The largest eigenvalue of \(S_{n-1}\) will be the second largest for \(S\). It is \(\ld_2\). The winning vector will be the eigenvector \(\q_2\) with \(S\q_2=\ld_2\q_2\).

Use induction to produce all the eigenvectors \(\q_1,\cds,\q_n\) and their eigenvalues \(\ld_1,\cds,\ld_n\). The Spectiral Theorem \(S=Q\Ld Q^T\) is proved even with repeated eigenvalues. All symmetric matrices can be diagonalized.

Computing the Eigenvalues of \(S\) and Singular Values of \(A\)

The first idea is to produce zeros in \(A\) and \(S\) without changing any \(\sg\)’s and \(\ld\)’s.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 7.3 Principal Component Analysis (PCA by the SVD)

For eaech of \(n\) samples we are measuring \(m\) variables. The data matrix \(A_0\) has \(n\) columns and \(m\) rows.

Graphically, the columns of \(A_0\) are \(n\) points in \(\R^m\). After we subtract the average of each row to reach \(A\), the \(n\) points are often clustered along a line orr close to a plane (or other low- dimensional subspace of \(\R^m\)).

Sample covariance matrix:

\[S=\frac{AA^T}{n-1}.\]

\(A\) shows the distance \(a_{ij}-\mu_i\) from each measurement to the row average \(\mu_i\).

The SVD of \(A\) (centered data) shows the dominant direction in the scatter plot.

The Essentials of Principal Component Analysis (PCA)

PCA gives a way to understand a data plot in dimension \(m =\) the number of measured variables. The crucial connection to linear algebra is in the singular values and singular vectors of \(A\). Those come from the eigenvalues \(\ld=\sg^2\) and the eigenvectors \(\u\) of the sample covariance matrix \(S=AA^T/(n-1)\).

  • The total variance in the data is the sum of all eigenvalues and of sample variances \(s^2\):

    Total variance \(T=\sg^2_1+\cds+\sg^2_m=s^2_1+\cds+s^2_m=\) trace (diagonal sum).

  • The first eigenvector \(\u_1\) of \(S\) points in the most significant direction of the data. That direction accounts for (or explains) a fraction \(\sg^2_1/T\) of the total variance.

  • The next eigenvector \(\u_2\) (orthogonal to \(\u_1\)) accounts for a smaller fraction \(\sg^2_2/T\).

  • Stop when those fractions are small. You have the \(R\) directions that explain most of the data. The \(n\) data points are very near an \(R\)-dimensional subspace with basis \(\u_1\) to \(\u_R\). These \(u\)’s are the princpal components in \(m\)-dimensional space.

  • \(R\) is the “effective rank” of \(A\). The true rank \(r\) is probably \(m\) or \(n\): full rank matrix.

Perpendicular Least Squares

The sum of squared distances from the points to the line is a minimum.

Proof: Separate each column \(\a_j\) into its components along the \(u_1\) line and \(u_2\) line:

  • Right triangles \(\dp\sum_{j=1}^n\lv\a_j\rv^2=\sum_{j=1}^n|\a_j^T\u_1|^2+\sum_{j=1}^n|\a_j^T\u_2|^2\).

The sum on the left is fixed by the data points \(\a_j\) (columns of \(A\)). The first sum on the right is \(\u_1^TAA^T\u_1\). So when we maximize that sum in PCA by choosing the eigenvector \(\u_1\), we minimize the second sum. That second sum (squared distances from the data points to the best line) is a minimum for perpendicular least squares.

The Sample Correlation Matrix

If scaling is a problem, we change from covariance matrix \(S\) to correlation matrix \(C\):

A diagonal matrix \(D\) rescales \(A\). Each row of \(DA\) has length \(\sqrt{n-1}\). The sample correlation matrix \(C=DAA^TD/(n-1)\) has 1’s on its diagonal.

Genetic Variation in Europe

The first singular vectors of genetic variation SNP matrix almost reproduce a map of Europe.

Eigenfaces

PCA provides a mechanism to recognize geometric/photometric similarity through algebraic means.

Applications of Eigenfaces

The first commercial use of PCA face recognition was for law enforcement and security.

Model Order Reduction

A reduced model tries to identify important states of the system.

Searching the Web

  1. The site may be an autority: Links come in from many sites. Especially from hubs.

  2. The site may be a hub: Links go out to many sites in the list. Especially to authorities.

  • Authority/Hub: \(\x_i^1/y_i^1=\) Add up \(\y_j^0/x_j^0\) for all links into \(i/\) out from \(i\).

  • Authority: \(\x^2=A^T\y^1=A^TA\x^0\).

  • Hub: \(\y^2=A\x^1=AA^T\y^0\).

When we take powers, the largest eigenvalues \(\sg_1^2\) begins to dominate.

PCA in Finance: The Dynamics of Interest Rates

The application of PCA is the yield cur for treasury securities.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 7.4 The Geometry of the SVD

The SVD separates a matrix into three steps: \((\)orthogonal\()\times(\)diagonal\()\times(\)orthogonal\()\). Ordinary words can express the geometry behind it: \((\)rotation\()\times(\)stretching\()\times(\)rotation\()\). \(U\Sg V^T\x\) starts with the rotation to \(V^T\x\). Then \(\Sg\) stretches that vector to \(\Sg V^T\x\), and \(U\) rotates to \(A\x=U\Sg V^T\x\).

\[\begin{split}\bb a&b\\c&d \eb=\bb \cos\theta&-\sin\theta\\\sin\theta&\cos\theta \eb \bb \sg_1\\&\sg_2 \eb\bb \cos\phi&\sin\phi\\-\sin\phi&\cos\phi \eb=U\Sg V^T.\end{split}\]

The four numbers \(a,b,c,d\) in the matrix \(A\) led to four numbers \(\theta,\sg_1,\sg_2,\phi\) in its SVD.

  1. The norm \(\lv A\rv\) of a matrix–its maximum growth factor.

  2. The polar decomposition \(A=QS\)–orthogonal \(Q\) times positive definite \(S\).

  3. The pseudoinverse \(A^+\)–the best inverse when the matrix \(A\) is not invertible.

The Norm of a Matrix

Note

The norm \(\lv A\rv\) is the largest ratio \(\dp \frac{\lv A\x\rv}{\lv\x\rv}\): \(\dp\lv A\rv=\max_{\x\neq\0}\frac{\lv A\x\rv}{\lv\x\rv}=\sg_1\)

Two valuable properties of that number norm(\(A\)) come directly from its denifition:

Note

  • Triangle inequality: \(\lv A+B \rv\leq\lv A \rv+\lv B\rv\).

  • Product inequality: \(\lv AB \rv\leq\lv A \rv\lv B \rv\).

For vectors:

\[\lv(A+B)\x\rv\leq\lv A\x\rv+\lv B\x\rv\leq\lv A \rv\lv\x\rv+\lv B\rv\lv\x\rv.\]

Divide this by \(\lv\x\rv\). Take the maximum over all \(\x\). Then \(\lv A+B \rv\leq\lv A \rv+\lv B \rv\).

The product inequality comes quickly from \(\lv AB\x \rv\leq\lv A\rv\lv B\x\rv\leq\lv A\rv\lv B\rv\lv \x\rv\). Again divide by \(\lv \x \rv\). Take the maximum over all \(\x\). The result is \(\lv AB \rv\leq\lv A\rv\lv B\rv\).

The closet rank \(k\) matrix to \(A\) is \(A_k=\sg_1\u_1\v_1^T+\cds+\sg_k\u_k\v_k^T\).

This is the key fact in matrix approximation: The Eckart-Young-Mirsky Theorem says that \(\lv A-B \rv\leq\lv A-A_k\rv=\sg_{k+1}\) for all matrices \(B\) of rank \(k\).

The \(\v\)’s and \(\u\)’s give orthonormal bases for the four fundamental subspaces, and the first \(k\) \(\v\)‘s and \(\u\) ‘s and \(\sg\)‘s give the best matrix approximation to \(A\).

Polar Decomposition \(A=QS\)

Every complex number \(x+iy\) has the polar form \(re^{i\th}\). A number \(r\geq 0\) multiplies a number \(e^{i\th}\) on the unit circle. We have \(x+iy=r\cos\th+ir\sin\th=r(\cos\th+i\sin\th)=re^{i\th}\). Think of these numbers as 1 by 1 matrices. Then \(e^{i\th}\) is an orthogonal matrix \(Q\) and \(r\geq 0\) is a positive semidefinite matrix (call it \(S\)). The polar decomposition extends the same idea to \(n\) by \(n\) matrices: orthogonal times positive semidefinite, \(A=QS\).

Note

Every real square matrix can be factored into \(A=QS\), where \(Q\) is orthogonal and \(S\) is symmetric positive semidefinite. If \(A\) is invertible, \(S\) is positive definite.

For the proof we just insert \(V^TV=I\) into the middle of the SVD:

Polar decomposition:

\[A=U\Sg V^T=(UV^T)(V\Sg V^T)=(Q)(S).\]

If \(A\) is invertible then \(Sg\) and \(S\) are also invertible. \(S\) is the symmetric positive definite square root of \(A^TA\), because \(S^2=V\Sg^2V^T=A^TA\). So the eigenvalues of \(S\) are the singular values of \(A\). The eigenvectors of \(S\) are the singular vectors \(\v\) of \(A\).

There is also a polar decomposition \(A=KQ\) in the reverse order. \(Q\) is the same but now \(K=U\Sg U^T\). Then \(K\) is the symmetric positive definite square root of \(AA^T\).

\(Q=UV^T\) is the nearest orthogonal matrix to \(A\). This \(Q\) makes the norm \(\lv Q-A \rv\) as small as possible. That corresponds to the fact that \(e^{i\th}\) is the nearsest number on the unit circle to \(re^{i\th}\).

Note

The nearest singular matrix \(A_0\) to \(A\) comes by changing the smallest \(\sg_{\min}\) to zero.

The Pseudoinverse \(A^+\)

Pseudoinverse of \(A\):

\[\begin{split}A^+=V\Sg^+U^T=\bb \\\ \v_1\cds\v_r\cds\v_n \\\ \eb \bb \sg_1^{-1}\\&\dds\\&&\sg_r^{-1}\\&&& \eb \bb \\\ \u_1\cds\u_r\cds\u_m \\\ \eb^T.\end{split}\]

The pseudoinverse \(A^+\) is an \(n\) by \(m\) matrix. If \(A^{-1}\) exists, than \(A^+\) is the same as \(A^{-1}\). In that case \(m=n=r\) and we are inverting \(U\Sg V^T\) to get \(V\Sg^{-1}U^T\). The new symbol \(A^+\) is needed when \(r<m\) or \(r<n\). Then \(A\) has no two-sided inverse, but it has a pseudoinverse \(A^+\) with that same rank \(r\):

\[A^+\u_i=\frac{1}{\sg_i}\v_i\quad\rm{for\ }i\leq r\quad\rm{and}\quad A^+\u_i=\0\quad\rm{for\ }i>r.\]

Note

Trying for \(AA\im=A\im A=I\):

  • \(AA^+=\) projection matrix onto the column space of \(A\).

  • \(A^+A=\) projection matrix onto the row space of \(A\).

Least Squares with Dependent Columns

Note

\(\x^+=A^+\b=(1,1)\) is the shortest solution to \(A^TA\wh{\x}=A^T\b\) and \(A\wh{\x}=\p\).

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 8 Linear Transformations

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 8.1 The Idea of a Linear Transformation

Note

A transformation \(T\) assigns an output \(T(\v)\) to each input vector \(\v\) in \(\bs{V}\). The transformation is linear if it meets these requirements for all \(\v\) and \(\w\):

  • \(T(\v+\w)=T(\v)+T(\w)\).

  • \(T(c\v)=cT(\v)\) for all \(c\).

If the input is \(\v=\0\), the output must be \(T(\v)=\0\).

Note

Linear transformation: \(T(c\v+d\w)\) must equal \(cT(\v)+dT(\w)\).

Shift is not linear:

\[\v+\w+\u_0\quad\rm{is\ not}\quad T(\v)+T(\w)=(\v+\u_0)+(\w+\u_0).\]

The exception is when \(\u_0=\0\). The transformation reduces to \(T(\v)=\v\). This is the identity transformation.

The linear-plus-shift transformation \(T(\v)=A\v+\u_0\) is called “affine”.

Lines to Lines, Triangles to Triangles, Basis Tells All

Linearity: Equally spaced points go to equally spaced poitns.

Note

Linearity: \(\u=c_1\v_1+c_2\v_2+\cds+c_n\v_n\) must transform to \(T(\u)=c_1T(\v_1)+c_2T(\v_2)+\cds+c_nT(\v_n)\).

Note

Suppose you know \(T(\v)\) for all vectors \(\v_1,\cds,\v_n\) in a basis, then you know \(T(\u)\) for every vector \(\u\) in the space.

The Fundamental Theorem of Calculus says that integration is the (pseudo)inverse of differentiation. For linear algebra, the matrix \(A^+\) is the (pseudo)inverse of the matrix \(A\). The derivative of a constant function is zero. That zero is on the diagonal of \(A^+A\). Calculus wouldn’t be calculus without that 1-dimensional nullspace of \(T=d/dx\).

Examples of Transformations (mostly linear)

All linear transformations from \(V=R^n\) to \(W=R^m\) are produced by matrices.

For a matrix, the column space contains all outputs \(A\v\). The nullspace contains all inputs for which \(A\v=\0\).

  • Range of \(T =\) set of all outputs \(T(\v)\). Range corresponds to column space.

  • Kernel of \(T =\) set of all inputs for which \(T(\v)=\0\). Kernel corresponds to nullspace.

Linear Transformations of the Plane

Refer to the textbook Page 406.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 8.2 The Matrix of a Linear Transformation

For ordinary column vectors, the input \(\v\) is in \(\bs{\rm{V}}=\R^n\) and the output \(T(\v)\) is in \(\bs{\rm{W}}=\R^m\). The matrix \(A\) for this transformation will be \(m\) by \(n\). Our choice of bases in \(\bs{\rm{V}}\) and \(\bs{\rm{W}}\) will decide \(A\).

All vector spaces \(\bs{\rm{V}}\) and \(\bs{\rm{W}}\) have bases. Each choice of those bases leads to a matrix for \(T\). When the input basis is different from the output basis, the matrix for \(T(\v)=\v\) will not be the identity \(I\). It will be the “change of basis matrix”.

Note

Suppose we know \(T(\v)\) for the input basis vectors \(\v_1\) to \(\v_n\). Columns \(1\) to \(n\) of the matrix will contain those outputs \(T(\v_1)\) to \(T(\v_n)\). \(A\) times \(c =\) matrix times vector \(=\) combination of those \(n\) columns. \(Ac\) is the correct combination \(c_1T(\v_1)+\cds+c_nT(\v_n)=T(\v)\).

Reason: Every \(\v\) is a unique combination \(c_1\v_1+\cds+c_n\v_n\) of the basis vectors \(\v_j\). Since \(T\) is a linear transformation, \(T(\v)\) must be the same combination \(c_1T(\v_1)+\cds+c_nT(\v_n)\) of the outputs \(T(\v_j)\) in the columns.

Change of Basis

Suppose the input space \(\bs{\rm{V}}=\R^2\) is also the output space \(\bs{\rm{W}}=\R^2\). Suppose that \(T(\v)=\v\) is the identity transformation. The matrix is \(I\) only when the input basis is the same as the output basis.

For this special case \(T(\v)=\v\), call the matrix \(B\) instead of \(A\). We are just changing basis from the \(\v\)’s to the \(\w\)’s. Each \(\v\) is a combination of \(\w_1\) and \(\w_2\).

Input basis:

\[\begin{split}\bb \\\ \v_1&\v_2 \\\ \eb=\bb 3&6\\3&8 \eb\end{split}\]

Output basis:

\[\begin{split}\bb \\\ \w_1&\w_2 \\\ \eb=\bb 3&0\\1&2 \eb\end{split}\]

Change of basis:

\[\begin{split}\begin{matrix} \v_1=1\w_1+1\w_2\\ \v_2=2\w_1+3\w_2\end{matrix}\end{split}\]

We apply the identity transformation \(T\) to each input basis vector: \(T(\v_1)=\v_1\) and \(T(\v_2)=\v_2\). Then we write those outputs \(\v_1\) and \(\v_2\) in the output basis \(\w_1\) and \(\w_2\). \(WB=V\) so \(B=W\im V\).

Matrix \(B\) for change of basis:

\[\begin{split}\bb \\\ \w_1&\w_2 \\\ \eb\bb \\\ B \\\ \eb=\bb \\\ \v_1&\v_2 \\\ \eb\quad \rm{is}\quad\bb 3&0\\1&2 \eb\bb 1&2\\1&3 \eb=\bb 3&6\\3&8 \eb.\end{split}\]

Note

When the input basis is in the columns of a matrix \(V\), and the output basis is in the columns of \(W\), the change of basis matrix for \(T=I\) is \(B=W\im V\).

The key: Suppose the same vector \(\u\) is written in the input basis of \(\v\)’s and the output basis of \(\w\)’s.

\[\begin{split}\begin{matrix}\u=c_1\v_1+\cds+c_n\v_n\\\u=d_1\w_1+\cds+d_n\w_n\end{matrix} \rm{\ is\ }\bb \\\ \v_1\cds\v_n \\\ \eb\bb c_1\\\vds\\c_n \eb= \bb \\\ \w_1\cds\w_n \\\ \eb\bb d_1\\\vds\\d_n \eb\rm{\ and\ } V\bs{c}=W\bs{d}.\end{split}\]

The coefficients \(\bs{d}\) in the new basis of \(\w\)’s are \(\bs{d}=W\im V\bs{c}\). Then \(B\) is \(W\im V\).

\(\bb x\\y \eb\) in the standard basis has coefficients \(\bb \\\ \w_1&\w_2\\\ \eb\im\bb x\\y \eb\) in the \(\w_1,\w_2\) basis.

Construction of the Matrix

Suppose \(T\) transforms the space \(\bs{\rm{V}}\) (\(n\) -dimensional) to the space \(\bs{\rm{W}}\) (\(m\)-dimensional). We choose a basis \(\v_1,\cds,\v_n\) for \(\bs{\rm{V}}\) and we choose a basis \(\w_1,\cds,\w_m\) for \(\bs{\rm{W}}\). The matrix \(A\) will be \(m\) by \(n\). To find the first column of \(A\), apply \(T\) to the first basis vector \(\v_1\). The output \(T(\v_1)\) is in \(\bs{\rm{W}}\).

Note

\(T(\v_1)\) is a combination \(a_{11}\w_1+\cds+a_{m1}\w_m\) of the output basis for \(\bs{\rm{W}}\).

These numbers* \(\a_{11},\cds,\a_{m1}\) go into the first column of \(A\). Transforming \(\v_!\) to \(T(\v_1)\) matches multiplying \((1,0,\cds,0)\) by \(A\). It yields that first column of the matrix. When \(T\) is the derivative and the first basis vector is 1, its derivative is \(T(\v_1)=\0\). So for the derivative matrix below, the first column of \(A\) is all zero.

Note

Key rule: The \(j\)th column of \(A\) is found by applying \(T\) to the \(j\)th basis vector \(\v_j\):

  • \(T(\v_j)=\) combination of output basis vectors \(=a_{1j}\w_1+\cds+a_{mj}\w_m\).

These numbers \(a_{ij}\) go into \(A\). The matrix is constructed to get the basis vectors right. Then linearity gets all other vectors right. Every \(\v\) is a combination \(c_1\v_1+\cds+c_n\v_n\), and \(T(\v)\) is a combination of the \(\w\)’s. When \(A\) multiplies the vector \(\bs{c}=(c_1,\cds,c_n)\) in the \(\v\) combination, \(A\bs{c}\) produces the coefficients in the \(T(\v)\) combination. This is because matrix multiplication (combining columns) is linear like \(T\).

The matrix \(A\) tells us what \(T\) does. Every linear transformation from \(\bs{\rm{V}}\) to \(\bs{\rm{W}}\) can be converted to a matrix. This matrix depends on the bases.

If you integrate a function and then differentiate, you get back to the start. But if you differentiate before integrating, the constant term is lost. If we differentiate and then integrate, we can multiply their matrices \(A^+A\).

Matrix Products \(AB\) Match Transformations \(TS\)

When we apply the transformation \(T\) to the output from \(S\), we get \(TS\) by this rule: \((TS)(\u)\) is defined to be \(T(S(\u))\). The output \(S(\u)\) becomes the input to \(T\).

When we apply the matrix \(A\) to the output from \(B\), we multiply \(AB\) by this rule: \((AB)(\x)\) is defined to be \(A(B\x)\). The output \(B\x\) becomes the input to \(A\). Matrix multiplication gives the correct matrix \(AB\) to represent \(TS\).

Note

Multiplication: The linear transformation \(TS\) starts with any vector \(\u\) in \(\bs{\rm{U}}\), goes to \(S(\u)\) in \(\bs{\rm{V}}\) and then to \(T(S(\u))\) in \(\bs{\rm{W}}\). The matrix \(AB\) starts with any \(\x\) in \(\R^p\), goes to \(B\x\) in \(\R^n\) and tehn to \(AB\x\) in \(\R^m\). The matrix \(AB\) correctly represents \(TS\):

  • \(TS: \bs{\rm{U}}\rightarrow\bs{\rm{V}}\rightarrow\bs{\rm{W}}\quad\) \(AB: (m\rm{\ by\ }n)(n\rm{\ by\ }p)=(m\rm{\ by\ }p)\)

Product of transfromations \(TS\) mathces product of matrices \(AB\).

Choosing the Best Bases

Choose bases that diagonalize the matrix. With the standard basis (the columns of \(I\)) our transformation \(T\) produces some matrix \(A\)–probably not diagonal. That same \(T\) is represented by different matrices when we choose different bases. The two greate choices are eigenvectors and singular vectors:

Eigenvectors: If \(T\) transforms \(\R^n\) to \(\R^n\), its matrix \(A\) is square. But using the standard basis, that matrix \(A\) is probably not diagonal. If there are \(n\) independent eigenvectors, choose those as the input and output basis. In this good basis, the matrix for \(T\) is the diagonal eigenvalue matrix \(\Ld\).

Note

\(A_{\rm{new}}=B\im AB\) in the new basis of \(\b\) ‘s is similar to \(A\) in the standard basis:

  • \(A_{\b\rm{'s\ to\ }\b\rm{'s}}=B\im_{\rm{standard\ to\ }\b\rm{'s}}A_{\rm{standard}}B_{\b\rm{'s\ to\ standard}}\)

Finally we allow different spaces \(V\) and \(W\), and different bases \(\v\)’s and \(\w\)’s.

Singular vectors: The SVD says that \(U\im AV=\Sg\). The right singular vectors \(\v_1,\cds,\v_n\) will be the input basis. The left singular vectors \(\u_1,\cds,\u_m\) will be the output basis. By the rule for matrix multiplication, the matrix for the same transformation in these new bases is \(B\im_{\rm{out}}AB_{\rm{in}}=U\im AV=\Sg\).

\(\Sg\) is “isometric” to \(A\): \(C=Q_1\im AQ_2\) is isometric to \(A\) if*` :math:`Q_1` *and \(Q_2\) are orthogonal.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 8.3 The Search for a Good Basis

The input basis vectors will be the columns of \(B_{\rm{in}}\). The output basis vectors will be the columnso f \(B_{\rm{out}}\).

Pure algebra: If \(A\) is the matrix for a transformation \(T\) in the standard basis, then \(B\im_{\rm{out}}AB_{\rm{in}}\) is the matrix in the new bases.

The standard basis vectors are the columns of the identity: \(B_{\rm{in}}=I_{n\times n}\) and \(B_{\rm{out}}=I_{m\times m}\). Now we are choosing special bases to make the matrix clearer and simpler than \(A\). When \(B_{\rm{in}}=B_{\rm{out}}=B\), the square matrix \(B\im AB\) is similar to \(A\).

Applied algebra: Applications are allabout choosing good bases.

1. \(B_{\rm{in}}=B_{\rm{out}}=\) eigenvector matrix \(X\).

Then \(X\im AX=\) eigenvalues in \(\Ld\). This choice requires \(A\) to be a square matrix with \(n\) independent eigenvectors. We get \(\Ld\) when \(B_{\rm{in}}=B_{\rm{out}}\) is the eigenvector matrix \(X\).

2. \(B_{\rm{in}}=V\) and \(B_{\rm{out}}=U\): singular vectors of \(A\).

Then \(U\im AV=\) diagonal \(\Sg\). \(\Sg\) is the singular matrix (with \(\sg_1,\cds,\sg_r\) on its diagonal) when \(B_{\rm{in}}\) and \(B_{\rm{out}}\) are the singular vector matrices \(V\) and \(U\). Recall that those columns of \(B_{\rm{in}}\) and \(B_{\rm{out}}\) are orthonormal eigenvectors of \(A^TA\) and \(AA^T\). Then \(A=U\Sg V^T\) gives \(\Sg=U\im AV\).

3. \(B_{\rm{in}}=B_{\rm{out}}=\) generalized eigenvectors of \(A\). Then \(B\im AB=\) Jordan form \(J\).

\(A\) is a square matrix but it may only have \(s\) independent eigenvectors. (If \(s=n\) then \(B\) is \(X\) and \(J\) is \(\Ld\).) In all cases Jordan constructed \(n-s\) additional “generalized” eigenvectors, aiming to make the Jordan form \(J\) as diagonal as possible:

  1. There are \(s\) square blocks along the diagonal of \(J\).

  2. Each block has one eigenvalue \(\ld\), one eigenvector, and 1’s above the diagonal.

The good case has \(n\) \(1\times 1\) blocks, each containing an eigenvalue. Then \(J\) is \(\Ld\) (diagonal).

The Jordan Form

For every \(A\), we want to choose \(B\) so that \(B\im AB\) is as nearly diagonal as possible. When \(A\) has a full set of \(n\) eigenvectors, they go into the columns of \(B\). Then \(B=X\). The matrix \(X\im AX\) is diagonal. This is the Jordan form of \(A\)–when \(A\) can be diagonalized. In the general case, eigenvectors are missing and \(\Ld\) can’t be reached.

Suppose \(A\) has \(s\) independent eigenvectors. Then it is similar to a Jordan matrix with \(s\) blocks. Each block has an eigenvalue on the diagonal with 1’s just above it. This block accounts for exactly one eigenvector of \(A\). Then \(B\) contains generalized eigenvectors as well as ordinary eigenvectors.

When there are \(n\) eigenvectors, all \(n\) blocks will be 1 by 1. In that case \(J=\Ld\).

The Jordan form solves the diefferential equation \(d\u/dt=A\u\) for any square matrix \(A=BJB\im\). The solution \(e^{At}\u(0)\) becomes \(\u(t)=Be^{Jt}B\im\u(0)\). \(J\) is triangular and its matrix exponential \(e^{Jt}\) involves \(e^{\ld t}\) times powers \(1,t,\cds,t^{s-1}\).

Note

Jordan form: If \(A\) has \(s\) independent eigenvectors, it is similar to a matrix \(J\) that has \(s\) Jordan blocks \(J_1,\cds,J_s\) on its diagonal. Some matrix \(B\) puts \(A\) into Jordan form:

  • Jordan form: \(B\im AB=\bb J_1\\&\dds\\&&J_s \eb=J\).

Each block \(J_i\) has one eigenvalue \(\ld_i\), one eigenvector, and \(1\)’s just above the diagonal:

  • Jordan block: \(J_i=\bb \ld_1&1\\&\cd &\cd\\&&\cd &1\\&&&\ld_i \eb\).

Matrices are similar if they share the same Jordan form \(J\)not otherwise.

Question: Find the eigenvalues and all possible Jordan forms if \(A^2=\) zero matrix.

Answer: The eigenvalues must all be zero, because \(A\x=\ld\x\) leads to \(A^2\x=\ld^2\x=0\x\). The Jordan form of \(A\) has \(J^2=0\) because \(J^2=(B\im AB)(B\im AB)=B\im A^2B=0\). Every block in \(J\) has \(\ld=0\) on the diagonal. Look at \(J^2_k\) for block sizes \(1,2,3\):

\[\begin{split}\bb 0 \eb^2=\bb 0 \eb\quad\bb 0&1\\0&0 \eb^2=\bb 0&0\\0&0 \eb\quad \bb 0&1&0\\0&0&1\\0&0&0 \eb^2=\bb 0&0&1\\0&0&0\\0&0&0 \eb.\end{split}\]

Conclusion: If \(J^2=0\) then all block sizes must be 1 or 2. \(J^2\) is not zero for 3 by 3.

The rank of \(J\) (and \(A\)) will be the total number of 1’s. The maximum rank is \(n/2\). This happens when there are \(n/2\) blocks, each of size 2 and rank 1.

4. \(B_{\rm{in}}=B_{\rm{out}}=\) Fourier matrix \(F\). Then \(F\x\) is a Discrete Fourier Transform of \(\x\).

We are starting with the eigenvectors \((1,\ld,\ld^2,\ld^3)\) and finding the matrices that have those eigenvectors:

\[\begin{split}\rm{If\ }\ld^4=1\rm{\ then}\quad P\x=\bb 0&1&0&0\\0&0&1&0\\0&0&0&1\\1&0&0&0\eb \bb 1\\\ld\\\ld^2\\\ld^3 \eb=\ld\bb 1\\\ld\\\ld^2\\\ld^3 \eb=\ld\x.\end{split}\]

The eigenvector matrix \(F\) diagonalizes the permutation matrix \(P\):

  • Eigenvalue matrix \(\Ld\):

\[\begin{split}\bb 1\\&i\\&&-1\\&&&-i \eb\end{split}\]
  • Eigenvector matrix is Fourier matrix \(F\):

\[\begin{split}\bb 1&1&1&1\\1&i&-1&-i\\1&i^2&1&(-i)^2\\1&i^3&-1&(-i)^3 \eb.\end{split}\]
\[\begin{split}P^2\x=\bb 0&1&0&0\\0&0&1&0\\0&0&0&1\\1&0&0&0\eb \bb 1\\\ld\\\ld^2\\\ld^3 \eb=\ld^2\bb 1\\\ld\\\ld^2\\\ld^3 \eb=\ld^2\x \rm{\ when\ }\ld^4=1.\end{split}\]

The fourth power is special because \(P^4=I\). If \(P\) and \(P^2\) and \(P^3\) and \(P^4=I\) have the same eigenvector matrix \(F\), so does any combination \(C=c_1P+c_2P^2+c_3P^3+c_0I\):

  • Circulant matrix:

\[\begin{split}C=\bb c_0&\bs{c_1}&c_2&c_3\\c_3&c_0&\bs{c_1}&c_2\\c_2&c_3&c_0&\bs{c_1}\\\bs{c_1}&c_2&c_3&c_0 \eb\end{split}\]
  • The four eigenvalues of \(C\) are given by the Fourier transform \(F\bs{c}\):

\[\begin{split}F\bs{c}=\bb 1&1&1&1\\1&i&-1&-i\\1&-1&1&-1\\1&-i&-1&i \eb \bb c_0\\c_1\\c_2\\c_3 \eb=\bb c_0+c_1+c_2+c_3\\c_0+ic_1-c_2-ic_3\\ c_0-c_1+c_2-c_3\\c_0-ic_1-c_2+ic_3 \eb.\end{split}\]

Notice that circulant matrices have constant diagonals. Constancy down the diagonals is a crucial property of \(C\). It corresponds to constant coefficients in a differential equation.

The equation \(\dp\frac{d^2u}{dt^2}=-u\) is solved by \(u=c_0\cos t+c_1\sin t\).

The equation \(\dp\frac{d^2u}{dt^2}=tu\) cannot be solved by elementary functions.

Bases for Function Space

If we had vectors instead of functions, the test for a good basis would look at \(B^TB\). This matrix contains all inner products between the basis vectors (columns of \(B\)). The basis is orthonormal when \(B^TB=I\). That is best possible. But the basis \(1,x,x^2,\cds\) produces the evil Hilbert matrix: \(B^TB\) has an enormous ratio between its largest and smallest eigenvalues.

Note: Now the columns of \(B\) are functions instead of vectors. We still use \(B^TB\) to test for independence. So we need to know the dot product (inner product is a better name) of two functions–those are the numbers in \(B^TB\). The inner product of functions will integrate instead of adding:

Inner product \((\bs{f},\bs{g})=\int f(x)g(x)dx\)

Comlex inner product \((\bs{f},\bs{g})=\int\bar{f(x)}g(x)dx\), \(\bar{f}=\) complex conjugate

Weighted inner product \((\bs{f},\bs{g})_w=\int w(x)\bar{f(x)}g(x)dx\), \(w=\) weight functions

When the integrals go from \(x=0\) to \(x=1\), the inner product of \(x^i\) with \(x^j\) is

\[\int^1_0 x^ix^jdx=\frac{x^{i+j+1}}{i+j+1}\bigg]^{x=1}_{x=0}=\frac{1}{i+j+1}= \rm{\ entries\ of\ Hilbert\ matrix\ }B^TB\]

By changing to the symmetric interval from \(x=-1\) to \(x=1\), we immediately have orthogonality between all even functions and all odd functions:

Interval [-1, 1]:

\[\int^1_{-1}x^2x^5dx=0\quad\int^1_{-1}\bs{\rm{even}}(x)\bs{\rm{odd}}(x)dx=0.\]

Orthogonal Bases for Function Space

Here are the three leading even-odd bases for theoretical and numerical computations:

Note

5. The Fourier basis: \(1,\sin x,\cos x,\sin 2x,\cos 2x,\cds\)

6. The Legendre basis: \(1,x,x^2-\frac{1}{3},x^3-\frac{3}{5}x,\cds\)

7. The Chebyshev basis: \(1,x,2x^2-1,4x^3-3x,\cds\)

The Fourier basis functions (sines and cosines) are all periodic. They repeat over every \(2\pi\) inteval because \(\cos(x+2\pi)=\cos x\) and \(\sin(x+2\pi)=\sin x\). This basis is also orthogonal. Every sine and cosine is orthogonal to every other sine and cosine. The sine-cosine basis is also excellent for approximation.

The Fourier transform onnects \(f(x)\) to the coefficient \(a_k\) and \(b_k\) in its Fourier series:

Note

Fourier series: \(f(x)=a_0+b_1\sin x+a_1\cos x+b_2\sin 2x+a_2\cos 2x+\cds\)

We see that function space is infinite-dimensional. It takes infinitely many basis functions to capture perfectly a typical \(f(x)\). But the formula for each coefficient (for example \(a_3\)) is just like the formula \(\b^T\a/\a^T\a\) for projecting a vector \(\b\) onto the line through \(\a\).

Here we are projecting the function \(f(x)\) onto the line in function space through \(\cos 3x\):

Fourier coefficient:

\[a_3=\frac{(f(x),\cos 3x)}{(\cos 3x,\cos 3x)}=\frac{\int f(x)\cos 3xdx}{\int \cos 3x\cos 3xdx}.\]

Fourier series is just linear algebra in function space.

Legendre Polynomials and Chebyshev Polynomials

The Legendre polynomials are the result of applying the Gram-Schmidt idea. The plan is to orthogonalize the powers \(1,x,x^2,\cds\).

\[\frac{(x^2,1)}{(1,1)}=\frac{\int x^2dx}{\int 1dx}=\frac{2/3}{2}=\frac{1}{3}\]

Gram-Schmidt gives \(\dp x^2-\frac{1}{3}=\) Legendre.

\[\frac{(x^3,x)}{(x,x)}=\frac{\int x^4dx}{\int x^2dx}=\frac{2/5}{2/3}=\frac{3}{5}\]

Gram-Schmidt gives \(\dp x^3-\frac{3}{5}x=\) Legendre

The Chebyshev polynomials \(1,x,2x^2-1,4x^3-3x\) are connected to \(1,\cos\th,\cos 2\th,\cos 3\th\). The connection of Chebyshev to Fourier appears when we set \(x=\cos\th\):

Note

Chebyshev to Fourier:

  • \(\begin{matrix} 2x^2-1=2(\cos\th)^2-1=\cos 2\th\\4x^3-3x=4(\cos\th)^3-3(\cos\th)=\cos 3\th \end{matrix}\).

The \(n^{\rm{th}}\) degree Chebyshev polynomial \(T_n(x)\) converts to Fourier’s \(\cos n\th=T_n(\cos\th)\).

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 9 Complex Vectors and Matrices

A complete presentation of linear algebra must include complex numbers \(z=x+iy\). Even when the matrix is real, the eigenvalues and eigenvectors are often complex.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 9.1 Complex Numbers

Note

A complex number is a real number plus an imaginary number. Addtion keeps the real and imaginary parts separate. Multiplication uses \(i^2=-1\):

  • Add: \((3+2i)+(3+2i)=6+4i\)

  • Multiply: \((3+2i)(1-i)=3+2i-3i-2i^2=5-i\).

The real part is \(a=\Re (a+bi)\). The imagenary part is \(b=\Im (a+bi)\).

The Complex Plane

Complex numbers correspond to points in a plane. Real numbers go along the \(x\) axis. Plane imaginary numbers are on the \(y\) axis. The complex number \(3+2i\) is at the point with coordinates \((3,2)\). The number zero, which is \(0+0i\), is at the origin.

The conjugate of \(z=a+bi\) is \(\bar{z}=a-bi\). The imagenary parts of \(z\) and \(\bar{z}\) have opposite signs. In the complex plane, \(\bar{z}\) is the image of \(z\) on the other side of the real axis.

When we multiply conjugates \(\bar{z}_1\) and \(\bar{z}_2\), we get the conjugate of \(z_1z_2\). And when we add \(\bar{z}_1\) and \(\bar{z}_2\), we get the conjugate of \(z_1+z_2\).

By taking conjugates of \(A\x=\ld\x\), when \(A\) we have another eigenvalue \(\bar{\ld}\) and its eigenvecctor \(\bar{\x}\):

Tip

Eigenvalues \(\ld\) and \(\bar{\ld}\): If \(A\x=\ld\x\) and \(A\) is real then \(A\bar{\x}=\bar{\ld}\bar{\x}\).

\(z+\bar{z}=\) real:

The sum of \(z=a+bi\) and its conjugate \(\bar{z}=a-bi\) is the real number \(2a\). The product of \(z\) times \(\bar{z}\) is the real number \(a^2+b^2\):

Note

Multiply \(z\) times \(\bar{z}\) to get \(|z|^2=r^2\): \((a+bi)(a-bi)=a^2+b^2\)

\[\frac{1}{z}=\frac{1}{a+ib}=\frac{1}{a+ib}\frac{a-ib}{a-ib}=\frac{a-ib}{a^2+b^2}\]

In case \(a^2+b^2=1\), this says that \((a+ib)\im\) is \(a-ib\). On the unit circle, \(1/z\) equals \(\bar{z}\).

The Polar Form \(re^{i\theta}\)

The square root of \(a^2+b^2\) is \(|z|\). This is the absolute value (or modulus) of the number \(z=a+ib\). The square root \(|z|\) is also written \(r\), because it is the distance from \(0\) to \(z\). The real number \(r\) in the polar form gives the size of the complex number \(z\):

The absolute value of \(z=a+ib\) is \(|z|=\sqrt{a^2+b^2}\). This is called \(r\).

The angle doubles when the number is squared.

Note

The number \(z=a+ib\) is also \(z=r\cos\th+ir\sin\th\). Tis is \(re^{i\th}\).

Note: \(\cos\th+i\sin\th\) has absolute value \(r=1\) because \(\cos^2\th+\sin^2\th=1\). Thus \(\cos\th+i\sin\th\) lies on the circle of radius 1–the unit circle.

If \(z\) is at angle \(\th\), its conjugate \(\bar{z}\) is at \(2\pi-\th\) and also at \(-\th\). \(1=e^0=e^{2\pi i}\).

Powers and Products: Polar Form

Note

The \(n\)th power of \(z=r(\cos\th+i\sin\th)\) is \(z^n=r^n(\cos n\th+i\sin n\th)\).

To multiply \(z\) times \(z\pr\), multiply \(r\)‘s and add angles:

\[r(\cos\th+i\sin\th)\rm{\ times\ }r\pr(\cos\th\pr+i\sin\th\pr)=rr\pr(\cos(\th+\th\pr)+i\sin(\th+\th\pr)).\]

One way to understand this is by trigonometry.

\[ \begin{align}\begin{aligned}(\cos\th+i\sin\th)\times(\cos\th+i\sin\th)=\cos^2\th+i^2\sin^2\th+2i\sin\th\cos\th\\=\cos^2\th-\sin^2\th+i2\sin\th\cos\th=\cos 2\th+i\sin 2\th\end{aligned}\end{align} \]

The second way to understand the rule for \(z^n\) is by Euler’s Formula

\[e^x=1+x+\frac{1}{2}x^2+\frac{1}{6}x^3+\cds\rm{\ becomes\ } e^{i\th}=1+i\th+\frac{1}{2}i^2\th^2+\frac{1}{6}i^3\th^3+\cds\]

Write \(-1\) for \(i^2\) to see \(1-\frac{1}{2}\th^2\). The complex number \(e^{i\th}\) is \(\cos\th+i\sin\th\)

Note

Euler’s Formula: \(e^{i\th}=\cos\th+i\sin\th\) gives \(z=r\cos\th+ir\sin\th=re^{i\th}\).

The special choice \(\th=2\pi\) gives \(\cos 2\pi+i\sin 2\pi\) which is 1. Somehow the infinite series \(e^{2\pi i}=1+2\pi i+\frac{1}{2}(2\pi i)^2+\cds\) adds up to 1.

The powers \((re^{i\th})^n\) are equal to \(r^ne^{in\th}\). They stay on the unit circle when \(r=1\) and \(r^n=1\). Then we find \(n\) different numbers whose \(n\)th powers equal 1:

Note

Set \(w=e^{2\pi i/n}\). The \(n\)th powers of \(1,w,w^2,\cds,w^{n-1}\) all equal 1.

Those are the \(n\)th roots of 1. They solve the equation \(z^n=1\). They are equally spaced around the unit circle, where the full \(2\pi\) is divided by \(n\). Multiply their angles by \(n\) to take \(n\)th powers. That gives \(w^n=e^{2\pi i}\) which is 1.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 9.2 Hermitian and Unitary Matrices

when you transpose a complex vector \(z\) or matrix \(A\), take the complex conjugate too.

Conjugate transpose:

\[\bar{z}^T=\bb \bar{z}_1&\cds&\bar{z}_n \eb=\bb a_1-ib_1&\cds&a_n-ib_n \eb.\]

Length squared: \(\bar{\bs{z}}^T\bs{z}=\lv\bs{z}\rv^2\)

\[\begin{split}\bb \bar{z}_1&\cds&\bar{z}_n \eb\bb z_1\\\vds\\z_n \eb=|z_1|^2+\cds+|z_n|^2.\end{split}\]

Note

The length \(\lv\bs{z}\rv^2\) is the square root of \(\bar{\bs{z}}^T\bs{z}=\bs{z}^H\bs{z}=|z_1|^2+\cds+|z_n|^2\).

\(\bs{z}^H=\bar{\bs{z}}^T\) is the conjugate transpose of \(\bs{z}\).

\(A^H\) is \(A\) Hermitian:

\[\begin{split}\rm{If\ }A=\bb 1&i\\0&1+i \eb\rm{\ then\ }A^H=\bb 1&0\\-i&1-i \eb.\end{split}\]

Complex Inner Products

Note

DEFINITION: The inner product of real or complex vectors \(\u\) and \(\v\) is \(\u^H\v\):

  • \(\u^H\v=\bb \bar{u}_1&\cds&\bar{u}_n \eb\bb v_1\\\vds\\v_n \eb=\bar{u}_1v_1+\cds+\bar{u}_nv_n\).

With complex vectors, \(\u^H\v\) is different from \(\v^H\u\). The order of the vectors is now important. In fact \(\v^H\u=\bar{v}_1u_1+\cds+\bar{v}_nu_n\) is the complex conjugate of \(\u^H\v\).

A zero inner product still means that the complex vectors are orthogonal.

The inner product of \(A\u\) with \(\v\) equals the inner product of \(\u\) with \(A^H\v\):

  • \(A^H\) is also called the “adjoint” of \(A\): \((A\u)^H\v=\u^H(A^H\v)\).

Note

The conjugate transpose of \(AB\) is \((AB)^H=B^HA^H\).

Hermitian Matrices \(S=S^H\)

Hermitian matrices: \(S=S^H\). The condition on the entries is \(s_{ij}=\bar{s_{ji}}\). Every real symmetric matrix is Hermitian, because taking its conjugate has no effect.

Note

If \(S=S^H\) and \(\bs{z}\) is any real or complex column vector, the number \(\bs{z}^HS\bs{z}\) is real.

Quick proof: \((\bs{z}^HS\bs{z})^H=\bs{z}^HS^H(\bs{z}^H)^H\) which is \(\bs{z}^HS\bs{z}\) again. So the number \(\bs{z}^HS\bs{z}\) equals its conjugate and must be real.

Note

Every eigenvalue of a Hermitian matrix is real.

Proof: Suppose \(S\bs{z}=\ld\bs{z}\). Multiply both sides by \(\bs{z}^H\) to get \(\bs{z}^HS\bs{z}=\ld\bs{z}^H\bs{z}\). On the left side, \(\bs{z}^HS\bs{z}\) is real. On the right side, \(\bs{z}^H\bs{z}\) is the length squared, real and positive. So the ratio \(\ld=\bs{z}^HS\bs{z}/\bs{z}^H\bs{z}\) is a real number.

Note

The eigenvectors of a Hermitian matrix are orthogonal (when they correspond to different eigenvalues). If \(S\bs{z}=\ld\bs{z}\) and \(S\y=\beta\y\) then \(\y^H\bs{z}=0\).

Proof: Multiply \(S\bs{z}=\ld\bs{z}\) on the left by \(\y^H\). Multiply \(\y^HS^H=\beta\y^H\) on the right by \(\bs{z}\):

\[\y^HS\bs{z}=\ld\y^H\bs{z}\quad\rm{and}\quad\y^HS^H\bs{z}=\beta\y^H\bs{z}.\]

The left sides are equal so \(\ld\y^H\bs{z}=\beta\y^H\bs{z}\). Then \(\y^H\bs{z}\) must be zero.

When \(S\) is real and symmetric, \(X\) is \(Q\)–an orthogonal matrix. Now \(S\) is complex and Hermitian. Its eigenvectors are complex and orthonormal. The eigenvector matrix \(X\) is like \(Q\), but complex: \(Q^HQ=I\).

Unitary Matrices

A unitary matrix \(Q\) is a (complex) square matrix that has orthonormal columns.

Unitary matrix that diagonalizes \(S\):

\[\begin{split}Q=\frac{1}{\sqrt{3}}\bb 1&1-i\\1+i&-i \eb.\end{split}\]

Note

Every matrix \(Q\) with orthonormal columns has \(Q^HQ=I\).

If \(Q\) is square, it is a unitary matrix. Then \(Q^H=Q\im\).

Suppose \(Q\) (with orthonormal columns) multiplies any \(\bs{z}\). The vector length stays the same, because \(\bs{z}^HQ^HQ\bs{z}=\bs{z}^H\bs{z}\). If \(\bs{z}\) is an eigenvector of \(Q\) we learn something more: The eigenvalues of unitary (and orthogonal) matrices \(Q\) all have absolute value \(|\ld|=1\).

Note

If \(Q\) is unitary then \(\lv Q\bs{z}\rv=\lv\bs{z}\rv\). Therefore \(Q\bs{z}=\ld\bs{z}\) leads to \(|\ld|=1\).

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 9.3 The Fast Fourier Transform

We want to multiply quickly by \(F\) and \(F\im\), the Fourier matrix and its inverse. This is achieved by the Fast Fourier Transform. An ordinary product \(F\bs{c}\) uses \(n^2\) multiplications (\(F\) has \(n^2\) entries). The FFT needs only \(n\) times \(\frac{1}{2}\log_2n\).

Roots of Unity and the Fourier Matrix

The solutions \(z\) of \(z^1\) are the \(n\)th roots of unity. They are \(n\) evenly spaced points around the unit circle in the complex plane.

Fourier matrix \(n=4\), \(w=i\):

\[\begin{split}F=\bb 1&1&1&1\\1&w&w^2&w^3\\1&w^2&w^4&w^6\\1&w^3&w^6&w^9 \eb= \bb 1&1&1&1\\1&i&i^2&i^3\\1&i^2&i^4&i^6\\1&i^3&i^6&i^9 \eb.\end{split}\]

The matrix is symmetric (\(F=F^T\)). It is not Hermitian. Its main diagonal is not real. But \(\frac{1}{2}F\) is a unitary matrix, which means that \((\frac{1}{2}F^H)(\frac{1}{2}F)=I\):

Note

The columns of \(F\) give \(F^HF=4I\). Its inverse is \(\frac{1}{4}F^H\) which is \(F\im=\frac{1}{4}\bar{F}\).

Every column has length \(\sqrt{n}\). So the unitary matrices are \(Q=F/\sqrt{n}\) and \(Q\im=\bar{F}/\sqrt{n}\). We avoid \(\sqrt{n}\) and just use \(F\) and \(F\im=\bar{F}/n\). The main point is to multiply \(F\) times \(c_0,c_1,c_2,c_3\):

4-point Fourier series:

\[\begin{split}\bb y_0\\y_1\\y_2\\y_3 \eb=F\bs{c}=\bb 1&1&1&1\\1&w&w^2&w^3\\1&w^2&w^4&w^6\\ 1&w^3&w^6&w^9 \eb\bb c_0\\c_1\\c_2\\c_3 \eb.\end{split}\]

The first output \(y_0=c_0+c_1+c_2+c_3\) is the value of the Fourier series \(\sum c_ke^{ikx}\) at \(x=0\). The second output is the alue of that series \(\sum c_ke^{ikx}\) at \(x=2\pi/4\):

\[y_1=c_0+c_1e^{i2\pi/4}+c_2e^{i4\pi/4}+c_3e^{i6\pi/4}=c_0+c_1w+c_2w^2+c_3w^3.\]

The third and fourth outputs \(y_2\) and \(y_3\) are the values of \(\sum c_ke^{ikx}\) at \(x=4\pi/4\) and \(x=6\pi/4\). These are finite Fourier series! They contain \(n=4\) terms and they are evaluated at \(n=4\) points. Those points \(x=0,2\pi/4,4\pi/4,6\pi/4\) are equally spaced.

We follow the convention that \(j\) and \(k\) go from \(0\) to \(n-1\) (instead of \(1\) to \(n\)).

The \(n\) by \(n\) Fourier matrix contains powers of \(w=e^{2\pi i/n}\):

\[\begin{split}F_n\bs{c}=\bb 1&1&1&\cd&1\\1&w&w^2&\cd&w^{n-1}\\1&w^2&w^4&\cd&w^{2(n-1)}\\ \cd&\cd&\cd&\cd&\cd\\1&w^{n-1}&w^{2(n-1)}&\cd&w^{(n-1)^2} \eb \bb c_0\\c_1\\c_2\\\cd\\c_{n-1} \eb=\bb y_0\\y_1\\y_2\\\cd\\y_{n-1} \eb=\y.\end{split}\]

\(F_n\) is symmetric but not Hermitian. Its columns are orthogonal, and \(F_n\bar{F}_n=nI\). Then \(F\im_n\) is \(\bar{F}_n/n\). The inverse contains powers of \(\bar{w}_n=e^{-2\pi i/n}\).

Note

The entry in row \(j\), column \(k\) is \(w^{jk}\). Row zero and column zero contain \(w^0=1\).

When a function \(f(x)\) has period \(2\pi\), and we change \(x\) to \(e^{i\th}\), the function is defined around the unit circle (where \(z=\e^{i\th}\)). The Discrete Fourier Transform is the same as interpolation. Find the polynomial \(p(z)=c_0+c_1z+\cds+c_{n-1}z^{n-1}\) that matches \(n\) values \(f_0,\cds,f_{n-1}\):

Note

Interpolation: Find \(c_0,\cds,c_{n-1}\) so that \(p(z)=f\) at \(n\) points \(z=1,\cds,w^{n-1}\).

The Fourier matrix is the Vandermonde matrix for interpolation at those \(n\) special points.

One Step of the Fast Fourier Transform

The key idea of FFT is to connect \(F_n\) with the half-size Fourier matrix \(F_{n/2}\). Assume that \(n\) is a power of 2. We will connect \(F_4\) to two copies of \(F_2\):

\[\begin{split}F_4=\bb 1&1&1&1\\1&i&i^2&i^3\\1&i^2&i^4&i^6\\1&i^3&i^6&i^9 \eb\quad\rm{and} \quad\bb \\\ &F_2\\&&F_2& \\\ \eb=\bb 1&1\\1&i^2\\&&1&1\\&&1&i^2 \eb\end{split}\]

Factors for FFT:

\[\begin{split}F_4=\bb 1&&1\\&1&&i\\1&&-1\\&1&&-i \eb\bb 1&1\\1&i^2\\&&1&1\\&&1&i^2 \eb\bb 1\\&&1\\&1\\&&&1 \eb.\end{split}\]

Note

\(F_{1024}=\bb I_{512}&D_{512}\\I_{512}&-D_{512}\eb\bb F_{512}\\&F_{512}\eb\bb\rm{even-odd}\\\rm{permutation}\eb\).

\(I_{512}\) is the identity matrix. \(D_{512}\) is the diagonal matrix with entries \((1,w,\cds,w^{511})\). The two copies of \(F_{512}\) use the 512th root of unity (which is nothing but \(w^2\)). The permutation matrix separates the incoming vector \(\bs{c}\) into its even and odd parts \(\bs{c}\pr=(c_0,c_2,\cds,c_{1022})\) and \(\bs{c}\ppr=(c_1,c_3,\cds,c_{1023})\).

Note

One step of the FFT: Set \(m=\frac{1}{2}n\). The first \(m\) and last \(n\) components of \(\y=F_n\bs{c}\) combine the half-size transform \(\y\pr=F_m\bs{c}\pr\) and \(\y\ppr=F_m\bs{c}\ppr\).

  • \(y_j=y_j\pr+(w_n)^jy_j\ppr,j=0,\cds,m-1\)

  • \(y_{j+m}=y_j\pr-(w_n)^jy_j\ppr,j=0,\cds,m-1\).

Split \(\bs{c}\) into \(\bs{c}\pr\) and \(\bs{c}\ppr\), transform them by \(F_m\) into \(\y\pr\) and \(\y\ppr\), then reconstructs \(\y\).

Those formulas come from separating \(c_0,\cds,c_{n-1}\) into even \(c_{2k}\) and odd \(c_{2k+1}\): \(w\) is \(w_n\).

\[\y=\bs{Fc}\quad y_j=\sum_0^{n-1}w^{jk}c_k=\sum_0^{m-1}w^{2jk}c_{2k}+ \sum_0^{m-1}w^{j(2k+1)}c_{2k+1}\rm{\ with\ }m=\frac{1}{2}n.\]

The even \(c\)’s go into \(\bs{c}\pr=(c_0,c_2,\cds)\) and the odd \(c\)’s go into \(\bs{c}\ppr=(c_1,c_3,\cds,)\). Then come the transforms \(F_mc\pr\) and \(F_mc\ppr\). The key is \(w_n^2=w_m\). This gives \(w_n^{2jk}=w_m^{jk}\).

\[y_j=\sum(w_m)^{jk}c_k\pr+(w_n)^j\sum(w_m)^{jk}c_k\ppr=y_j\pr+(w_n)^jy_j\ppr.\]

For \(j\geq m\), the minus sign comes from factoring out \((w_n)^m=-1\) from \((w_n)^j\).

The Full FFT by Recursion

The final count for size \(n=2^l\) is reduced from \(n^2\) to \(\frac{1}{2}nl\).

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 12 Linear Algebra in Probability & Statistics

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 12.1 Mean, Variance, and Probability

  • The mean is the average value or expected value.

  • The variance \(\sg^2\) measures the average squared distance from the mean \(m\).

  • The probabilities of \(n\) different outcomes are positive numbers \(p_1,\cds,p_n\) adding to 1.

The sample mean starts with \(N\) samples \(x_1,\cds,\x_N\) from a completed trial. Their mean is the average of the \(N\) observed samples:

Sample mean: \(\dp m=\mu=\frac{1}{N}(x_1+x_2+\cds+x_N)\).

The expected value of \(\x\) starts with the probabilities \(p_1,\cds,p_n\) of \(x_1,\cds,x_n\):

Note

Expected value: \(m=\rm{E}[x]=p_1x_1+p_2x_2+\cds+p_nx_n\).

This is \(\p\cd\x\). Notice that \(m=\bs{E}[x]\) tells us what to expect, \(m=\mu\) tells us what we got.

Variance (around the mean)

The variance \(\sg^2\) measures expected distance (squared) from the expected mean \(\bs{E}[x]\). The sample variance \(S^2\) measures actual distance (squared) from the sample mean. The square root is the standard deviation \(\sg\) or \(S\).

Note

Sample variance: \(\dp S^2=\frac{1}{N-1}[(x_1-m)^2+\cds+(x_N-m)^2]\).

Statisticians divide by math:N-1 and not \(N\) so that \(S^2\) is an unbiased estimate of \(\sg^2\). One degree of freedom is already accounted for in the sample mean.

An important identity comes from splitting each \((x-m)^2\) into \(x^2-2mx+m^2\):

sum of \((x_i-m)^2=(\)sum of \(x_i^2)-2m(\)sum of \(x_i)+(\)sum of \(m^2)\)

\(=(\)sum of \(x_i^2)-2m(Nm)+Nm^2=(\)sum of \(x_i^2)-Nm^2\).

Note

Variance: \(\sg^2=\rm{E}[(x=m)^2]=p_1(x_1-m)^2+\cds+p_n(x_n-m)^2\).

Continuous Probability Distributions

\(F=\) integral of \(p\): Probability of \(\dp a\leq x\leq b=\int_a^b p(x)dx=F(b)-F(a)\).

Mean and Variance of \(p(x)\)

Note

Mean: \(\dp m=\rm{E}[x]=\int xp(x)dx\).

Note

Variance: \(\dp\sg^2=\rm{E}[(x-m)^2]=\int p(x)(x-m)^2dx\).

Note

Uniform distribution for \(0\leq x\leq a\); Density \(\dp p(x)=\frac{1}{a}\); Cumulative \(\dp F(x)=\frac{x}{a}\):

  • Mean: \(\dp m=\frac{a}{2}\) halfway

  • Variance: \(\dp \sg^2=\int_0^a\frac{1}{a}\left(x-\frac{a}{2}\right)^2dx=\frac{a^2}{12}\).

Normal Distribution: Bell-shaped Curve

Note

Central Limit Theorem (informal): The average of \(N\) samples of “any” probability distribution approaches a normal distribution as \(N\rightarrow\infty\).

The standard normal distribution is symmetric around \(x=0\), so its mean value is \(m=0\). It is chosen to have a standard variance \(\sg^2=1\). It is called \(\bs{\rm{N}}(0,1)\).

Note

Standard normal distribution: \(\dp p(x)=\frac{1}{\sqrt{2\pi}}e^{-x^2/2}\)

  • Total probability \(=1\): \(\dp \int_{-\infty}^{\infty}p(x)dx=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}e^{-x^2/2}dx=1\)

  • Mean \(\rm{E}[x]=0\): \(\dp m=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}xe^{-x^2/2}dx=0\)

  • Variance \(\rm{E}[x^2]=1\): \(\dp\sg^2=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}(x-0)^2e^{-x^2/2}dx=1\)

The probability that a random sample falls between \(-\sg\) and \(\sg\) is \(F(\sg)-F(-\sg)\approx\frac{2}{3}\). This is because \(\int_{-\sg}^{\sg}p(x)dx\) equals \(\int_{-\infty}^{\sg}p(x)dx-\int_{-\infty}^{-\sg}p(x)dx=F(\sg)-F(-\sg)\).

Similarly, the probability that a random \(x\) lies between \(-2\sg\) and \(2\sg\) is \(F(2\sg)-F(-2\sg)\approx 0.95\).

The normal distribution with any mean \(m\) and standard deviation \(\sg\) comes by shifting and stretching the standard \(\bs{\rm{N}}(0,1)\). Shift \(x\) to \(x-m\). Stretch \(x-m\) to \((x-m)/\sg\).

Note

Gaussian density \(p(x)\); Normal distribution \(\bs{\rm{N}}(m,\sg)\):

  • \(\dp p(x)=\frac{1}{\sg\sqrt{2\pi}}e^{-(x-m)^2/2\sg^2}\)

The integral of \(p(x)\) is \(F(x)\)–the probability that a random sample will fall below \(x\). The differential \(p(x)dx=F(x+dx)-F(x)\) is the probability that a random sample will fall between \(x\) and \(x+dx\). There is no simple formula to integrate \(e^{-x^2/2}\), so this cumulative distribution \(F(x)\) is computed and tabulated very carefully.

\(N\) Coin Flips and \(N\rightarrow \infty\)

Note

Linearity: \(x_{\rm{new}}=ax_{\rm{old}}+b\) has \(m_{\rm{new}}=am_{\rm{old}}+b\) and \(\sg_{\rm{new}}^2=a^2\sg_{\rm{old}}^2\).

Note

Shafted and scaled: \(\dp X=\frac{x-m}{\sg}=\frac{x-\frac{1}{2}N}{\sqrt{N}/2}\)

  • Subtracting \(m\) is “centering” or “detrending”. The mean of \(X\) is zero.

  • Dividing by \(\sg\) is “normalizing” or “standardizing”. The variance of \(X\) is 1.

Note

The center probability \(\dp\bigg(\frac{N}{2}\) heads, \(\dp\frac{N}{2}\) tails\(\bigg)\) is \(\dp\frac{1}{2^N}\frac{N!}{(N/2)!(N/2)!}\).

For large \(N\), Stirling’s formula \(\sqrt{2\pi N}(N/e)^N\) is a close approximation to \(N!\)/ Use Stirling for \(N\) and twice for \(N/2\):

Limit of coin-flip Center probability:

\[p_{N/2}\approx\frac{1}{2^N}\frac{\sqrt{2\pi N}(N/e)^N}{\pi N(N/2e)^N}= \frac{\sqrt{2}}{\sqrt{\pi N}}=\frac{1}{\sqrt{2\pi}\sg}.\]

Monte Carlo Estimation Methods

Applied mathematics has moved to accepting uncertainty in the inputs and estimating the variance in the outputs. Monte Carlo method approximates an expected value \(\rm{E}[x]\) by a sample average \((x_1+\cds+x_N)/N\).

Each sample comes from a set of data \(b_k\). Monte Carlo randomly chooses this data \(b_k\), it computes the outputs \(x_k\), and then it averages those \(x\)’s. Decent accuracy for \(\rm{E}[x]\) often requires many samples \(b\) and huge computing cost. The error in approimating \(\rm{E}[x]\) by \((x_1+\cds+x_N)/N\) is normally of order \(1/\sqrt{N}\). Slow improvements as \(N\) increases.

Suppose it is much simpler to simulate another variable \(y(b)\) close to \(x(b)\). Then use \(N\) computationis of \(y(b_k)\) and only \(N^*<N\) computations of \(x(b_k)\) to estimate \(\rm{E}[x]\).

Note

2-level Monte Carlo:

  • \(\dp \rm{E}[x]\approx\frac{1}{N}\sum_1^N y(b_k)+\frac{1}{N^*}\sum_1^{N^*}[x(b_k)-y(b_k)]\).

The idea is that \(x-y\) has a smaller variance \(\sg^*\) than the original \(x\)/ Therefore \(N^*\) can be smaller than \(N\), with the same accuracy for \(\rm{E}[x]\). We do \(N\) cheap simulations to find the \(y\)’s. Those cost \(C\) each. We only do \(N^*\) expensive simulations involving \(x\)’s. Those cost \(C^*\) each. The total computing cost is \(NC+N^*C^*\).

Calculus minimizes the overall variance for a fixed total cost. The optimal ratio \(N^*/N\) is \(\sqrt{C/C^*}\sg^*/\sg\). Three-level Monte Carlo would simulate \(x,y\) and \(z\):

\[\rm{E}[x]\approx\frac{1}{N}\sum_1^N z(b_k)+\frac{1}{N^*}\sum_1^{N^*}[y(b_k)- z(b_k)]+\frac{1}{N^{**}}\sum_1^{N^{**}}[x(b_k)-y(b_k)].\]

Review: Three Formulas for the Mean and the Variance

Note

  1. Samples \(X_1\) to \(X_N\):

    • \(\dp m=\frac{X_1+\cds+X_N}{N}\)

    • \(\dp S^2=\frac{(X_1-m)^2+\cds+(X_N-m)^2}{N-1}\)

  2. \(n\) possible outputs with probabilities \(p_i\):

    • \(\dp m=\sum_1^np_ix_i\)

    • \(\dp \sg^2=\sum_1^np_i(x_i-m)^2\)

  3. Range of outputs with probability density:

    • \(\dp m=\int xp(x)dx\)

    • \(\dp \sg^2=\int(x-m)^2p(x)dx\)

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 12.2 Covariance Matrices and Joint Probabilities

\(p_{ij}=\) Probability that experiment 1 produces \(x_i\) and experiment 2 produces \(y_j\).

Note

Covariance: \(\dp\sg_{12}=\sum_{\rm{all}}\sum_{i,j}p_{ij}(x_i-m_1)(y_j-m_2)\).

Probability matrix:

\[\begin{split}P=\bb p_{11}&p_{12}\\p_{21}&p_{22} \eb\end{split}\]

Notice the row sums \(p_i\) and column sums \(P_i\) and the total sum = 1. Those numbers \(p_1,p_2\) and \(P_1,P_2\) are called the marginals of the matrix \(P\).

Note

Zero covariance \(\sg_{12}\) for independent trials:

  • \(V=\bb \sg_1^2&0\\0&\sg_2^2 \eb=\) diagonal covariance matrix.

Independent experiments have \(\sg_{12}=0\) because every \(p_{ij}=(p_i)(p_j)\) in equation:

\[ \begin{align}\begin{aligned}\sg_{12}=\sum_i\sum_j(p_i)(p_j)(x_i-m_1)(y_j-m_2)=\\\bigg[\sum_i(p_i)(x_i-m_1)\bigg]\bigg[\sum_j(p_j)(y_j-m_2)\bigg]=[0][0].\end{aligned}\end{align} \]

Always \(\sg_1^2\sg_2^2\geq\sg_{12}^2\). Thus \(\sg_{12}\) is between \(-\sg_1\sg_2\) and \(\sg_1\sg_2\). The covariance matrix \(V\) is positive (semi)definite. This is an important fact about \(M\) by \(M\) covariance matrices for \(M\) experiments.

Note that the sample covariance matrix \(S\) from \(N\) trials is certainly semidefinite. Every new sample \(X\) contributes to the sample mean \(\bar{\X}\) and to \(S\). Each term \((X_i-\bar{\X})(X_i-\bar{\X})^T\) is positive semidefinite and we just add to reach \(S\):

Note

\(\dp\bar{\X}=\frac{X_1+\cds+X_N}{N}\)

\(\dp S=\frac{(X_1-\bar{\X})(X_1-\bar{\X})^T+\cds+(X_N-\bar{\X})(X_N-\bar{\X})^T}{N-1}\)

The Covariance Matrix \(V\) is Positive Semidefinite

Total probability (all pairs) is 1:

\[\sum_{\rm{all}}\sum_{i,j}p_{ij}=1.\]

Row sum \(p_i\) of \(P\):

\[\sum_{j=1}^np_{ij}=\rm{probability\ }p_i\rm{\ of\ }x_i\rm{\ in\ experiment\ 1}.\]

Note

Covariance matrix \(V=\Sg\Sg V_{ij}\):

  • \(\dp V=\sum_{\rm{all}}\sum_{i,j}p_{ij}\bb (x_i-m_1)^2&(x_i-m_1)(y_j-m_2)\\(x_i-m_1)(y_j-m_2)&(y_j-m_2)^2\eb\).

\[V_{11}=\sum_{\rm{all}}\sum_{i,j}p_{ij}(x_j-m_1)^2=\sum_{\rm{all\ }i}(\rm{probability\ of\ }x_i)(x_i-m_1)^2=\sg_1^2.\]

The matrix \(V_{ij}\) for each pair of outcomes \(i,j\) is positive semidefinite. \(V_{ij}\) has diagonal entries \(p_{ij}(x_i-m_1)^2\geq 0\) and \(p_{ij}(y_j-m_2)^2\geq 0\) and \(\det(V_{ij})=0\). That matrix \(V_{ij}\) has rank 1.

\[\begin{split}\bb (x_i-m_1)^2&(x_i-m_1)(y_j-m_2)\\(x_i-m_1)(y_j-m_2)&(y_j-m_2)^2\eb= \bb x_i-m_1\\y_j-m_2 \eb\bb x_i-m_1&y_j-m_2 \eb.\end{split}\]

Every matrix \(UU^T\) is positive semidefinite. Sothe whole matrix \(V\) (combining these matrices \(UU^T\) with weights \(p_{ij}\geq 0\)) is at least semidefinite.

The covariance matrix \(V\) is positive definite unless the experiments are dependent.

Note

Covariance matrix: :math:`V=rm{E}[(X-bar{X})(X-bar{X})^T]

\[ \begin{align}\begin{aligned}\rm{Variance\ of\ }c^T\X=\rm{E}[(c^T\X-c^T\bar{\X})(c^T\X-c^T\bar{\X})^T]\\=c^T\rm{E}[(\X-\bar{\X})(\X-\bar{\X})]c=c^TVc\end{aligned}\end{align} \]

The variance of \(c^T\X\) can never be negative. So \(c^TVc\geq 0\). The covariance matrix \(V\) is therefore positive semidefinite by the energy test \(c^TVc\geq 0\).

\(V\) equals \(Q\Ld Q^T\) with eigenvalues \(\ld_i\geq 0\) and orthonormal eigenvectors \(\q_1\) to \(\q_M\). Diagonalizing the covariance matrix means finding \(M\) independent experiments as combinations of the original \(M\) experiments.

Note

Covariance matrix: \(\dp V=\iiint p(x,y,z)UU^Tdxdydz\) with \(U=\bb x-\bar{x}\\y-\bar{y}\\z-\bar{z} \eb\).

  • Independent variables \(x,y,z\): \(p(x,y,z)=p_1(x)p_2(y)p_3(z)\).

  • Dependeent variables \(x,y,z\): \(p(x,y,z)=0\) except when \(cy+dy+ez=0\).

The Mean and Variance of \(z=x+y\)

The sample mean of \(z=x+y\) is clearly \(m_z=m_x+m_y\):

Note

Mean of sum = Sum of means:

  • \(\dp\frac{1}{N}\sum_1^N(x_i+y_i)=\frac{1}{N}\sum_1^Nx_i+\frac{1}{N}\sum_1^Ny_i\).

\[ \begin{align}\begin{aligned}\rm{E}[x+y]=\sum_i\sum_j p_{ij}(x_i+y_j)=\sum_i\sum_j p_{ij}x_i+\sum_i\sum_j p_{ij}y_j.\\\sum_i\sum_j p_{ij}x_i=\sum_i(p_{i1}+\cds+p_{iN})x_i=\sum_i p_ix_i=\rm{E}[x]\\\sum_i\sum_j p_{ij}y_j=\sum_j(p_{1j}+\cds+p_{nj})y_j=\sum_j p_jy_j=\rm{E}[y]\end{aligned}\end{align} \]
\[ \begin{align}\begin{aligned}\sg_z^2&=\sum\sum p_{ij}(x_i+y_j-m_x-m_y)^2\\&=\sum\sum p_{ij}(x_I-m_x)^2+\sum\sum p_{ij}(y_j-m_y)^2+2\sum\sum p_{ij}(x_i-m_x)(y_j-m_y)\end{aligned}\end{align} \]

The variance of \(z=x+y\) is \(\sg_z^2=\sg_x^2+\sg_y^2+2\sg_{xy}\).

The Covariance Matrix for \(Z=AX\)

Note

The covariance matrix of \(Z=AX\) is \(V_Z=AV_XA^T\).

The Correlation \(\rho\)

The new \(X=x/\rho_x\) and \(Y=y/\rho_y\) have variance \(\sg_X^2=\sg_Y^2=1\). The correlation of \(x\) and \(y\) is the covariance of \(X\) and \(Y\).

Note

Correlation: \(\dp\rho_{xy}=\frac{\sg_{xy}}{\sg_x\sg_y}=\) covariance of \(\dp\frac{x}{\sg_x}\) and \(\dp\frac{y}{\rho_y}\). Always \(-1\leq\rho_{xy}\leq 1\).

Zero covariance gives zero correlation. Independent random variables produce \(\rho_{xy}=0\).

Since \(\sg_{xy}^2\leq\sg_x^2\sg_y^2\), then \(\rho_{xy}^2\leq 1\). Correlation near \(\rho=+1\) means strong dependence in the same direction. Negative correlation means that \(y\) tends to be below its mean when \(x\) is above its mean.

\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\vds}{\vdots} \newcommand{\lv}{\lVert} \newcommand{\rv}{\rVert} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\a}{\boldsymbol{a}} \newcommand{\b}{\boldsymbol{b}} \newcommand{\e}{\boldsymbol{e}} \newcommand{\i}{\boldsymbol{i}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\p}{\boldsymbol{p}} \newcommand{\q}{\boldsymbol{q}} \newcommand{\u}{\boldsymbol{u}} \newcommand{\v}{\boldsymbol{v}} \newcommand{\w}{\boldsymbol{w}} \newcommand{\x}{\boldsymbol{x}} \newcommand{\y}{\boldsymbol{y}}\\\newcommand{\A}{\boldsymbol{A}} \newcommand{\B}{\boldsymbol{B}} \newcommand{\C}{\boldsymbol{C}} \newcommand{\N}{\boldsymbol{N}} \newcommand{\X}{\boldsymbol{X}}\\\newcommand{\R}{\boldsymbol{\mathrm{R}}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\Lambda} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\Sigma} \newcommand{\th}{\theta}\\\newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 12.3 Multivariate Gaussian and Weighted Least Squares

Mean \(m\) and variance \(\sg^2\):

\[p(x)=\frac{1}{\sqrt{2\pi}\sg}e^{-(x-m)^2/2\sg^2}.\]
\[\int_{-\infty}^\infty p(x)dx=1\quad\rm{and}\quad \int_{m-\sg}^{m+\sg}p(x)dx=\frac{1}{\sqrt{2\pi}}\int_{-1}^1e^{-X^2/2}dX \approx\frac{2}{3}.\]

Every Gaussian turns into a standard Gaussian \(p(x)\) with mean \(m=0\) and variance \(\sg^2=1\). The standard ormal distribution \(N(0,1)\) has \(\dp p(x)=\frac{1}{\sqrt{2\pi}}e^{-x^2/2}\).

Integrating \(p(x)\) from \(-\infty\) to \(x\) gives the cumulative distribution \(F(x)\): the probability that a random sample is below \(x\). The probability will be \(F=\frac{1}{2}\) at \(x=0\).

Two-dimensional Gaussians

Independent \(x\) and \(y\):

\[p(x,y)=\frac{1}{2\pi\sg_1\sg_2}e^{-(x-m_1)^2/2\sg_1^2}e^{-(y-m_2)^2/2\sg_2^2}.\]

The covariance of \(x\) and \(y\) will be \(\sg_{12}=0\). The covariance matrix \(V\) will be diagonal.

Notice that the two exponents can be combined into \(-\frac{1}{2}(\x-\bs{m})^TV\im(\x-\bs{m})\) with \(V\im\) in the middle:

\[\begin{split}-\frac{(x-m_1)^2}{2\sg_1^2}-\frac{(y-m_2)^2}{2\sg_2^2}=-\frac{1}{2} \bb x-m_1&y-m_2 \eb\bb \sg_1^2&0\\0&\sg_2^2 \eb\im\bb x-m_1\\y-m_2 \eb.\end{split}\]

Non-independent \(x\) and \(y\)

Note

Multivariate Gaussian probability distribution:

  • \(\dp p(\x)=\frac{1}{(\sqrt{2\pi})^M\sqrt{\det V}}e^{-(\x-\bs{m})^TV\im(\x-\bs{m})/2}\).

\[X=\x-\bs{m}\quad(\x-\bs{m})^TV\im(\x-\bs{m})=X^TQ\Ld\im Q^TX=Y^T\Ld\im Y.\]

Notice that the combinations \(Y=Q^TX=Q^T(\x-\bs{m})\) are statistically independent. Their covariance matrix \(\Ld\) is diagonal.

\[ \begin{align}\begin{aligned}\int\cds\int e^{-Y^T\Ld\im Y/2}dY= \bigg(\int_{-\infty}^{\infty}e^{y_1^2/2\ld_1}dy_1\bigg)\cds \bigg(\int_{-\infty}^{\infty}e^{y_M^2/2\ld_M}dy_M\bigg)\\=(\sqrt{2\pi\ld_1})\cds(\sqrt{2\pi\ld_M})=(\sqrt{2\pi})^M\sqrt{\det V}.\end{aligned}\end{align} \]

Vector \(\bs{m}\) of means:

\[\int\cds\int\x p(\x)d\x=(m_1,m_2,\cds)=\bs{m}.\]

Covariance matrix \(V\):

\[\int\cds\int(\x-\bs{m})p(\x)(\x-\bs{m})^Td\x=V.\]

Weighted Least Squares

The good measure of error is \(E=(\b-A\x)^TV\im(\b-A\x)\).

Note

Weighted least squares: \(A^TV\im A\wh{\x}=A^TV\im\b\).

The most important examples have \(m\) independent errors in \(\b\). Those errors have variances \(\ld_1^2,\cds,\ld_m^2\). By independence, \(V\) is a diagonal matrix. The good weights \(1/\ld_1^2,\cds,1/\ld_m^2\) come from \(V\im\). We are weighting the errors in \(\b\) to have variance = 1:

Note

Weighted least squares; Independent errors in \(\b\):

  • Minimize \(\dp E=\sum_{i=1}^m\frac{(\b-A\x)_i^2}{\sg_i^2}\).

  1. Start with \(A\x=\b\) (\(m\) equations, \(n\) unknonws, \(m>n\) no solution)

  2. Each right side \(b_i\) has mean zero and variance \(\sg_i^2\). The \(b_i\) are independent.

  3. Divide the \(i\)th equation by \(\sg_i\) to have variance = 1 for every \(b_i/\sg_i\).

  4. That division turns \(A\x=\b\) into \(V^{-1/2}A\x=V^{-1/2}\b\) with \(V^{-1/2}=\rm{diag}(1/\sg_1,\cds,1/\sg_m)\).

  5. Ordinary least squares on those weighted equations has \(A\rightarrow V^{-1/2}A\) and \(\b\rightarrow V^{-1/2}\b\).

Note

\((V^{-1/2}A)^T(V^{-1/2}A)\wh{\x}=(V^{-1/2}A)^TV^{-1/2}\b\) is \(A^TV\im A\wh{\x}=A^TV\im\b\).

The Variance in the Estimated \(\widehat{x}\)

Note

Variance-covariance matrix \(W\) for \(\wh{\x}\):

  • \(\rm{E}[(\wh{\x}-\x)(\wh{\x}-\x)^T]=(A^TV\im A)\im\).

The smallest possible variance comes from the best possible weighting, which is \(V\im\).

If \(\b\) has covariance matrix \(V\), then \(\wh{\x}=L\b\) has covariance matrix \(LVL^T\).

\[LVL^T=(A^TV\im A)\im A^TV\im\quad V\quad V\im A(A^TV\im A)\im=(A^TV\im A)\im\]

The Kalman Filter

The \(\wh{\x}_k\) will be our best least squares estimate of the latest solution \(\x_k\) to the whole history of observation equations and update equations (state equations) up to time \(k\).

OLD: \(A_0\x_0=\b_0\) leads to the weighted equation \(A_0^TV_0\im A_0\wh{\x}_0=A_0^TV_0\im\b_0\).

NEW: \(\bb A_0\\A_1 \eb\wh{\x}_1=\bb \b_0\\b_1 \eb\) leads to the following weighted equation for \(\wh{\x}_1\):

\[\begin{split}\bb A_0^T&A_1^T \eb\bb V_0\im\\&V_1\im \eb\bb A_0\\A_1 \eb\wh{\x}_1= \bb A_0^T&A_1^T \eb\bb V_0\im\\&V_1\im \eb\bb \b_0\\\b_1 \eb.\end{split}\]

Note

Kalman update gives \(\wh{\x}_1\) from \(\wh{\x}_0\):

  • \(\wh{\x}_1=\wh{\x}_0+K_1(\b_1-A_1\wh{\x}_0)\).

Note

Covariance \(W_1\) of errors in \(\wh{\x}_1\): \(W_1\im=W_0\im+A_1^TV_1\im A_1\).

Kalman gain matrix \(K_1\): \(K_1=W_1A_1^TV_1\im\).