Computational Linear Algebra, Part 4: Convergence of Iterative Methods (Mathematica)

We consider a fixed point problem for function f:

\[ \mathbf{x} = f(\mathbf{x}) . \]

Existence of fixed point is guaranteed by the Banach fixed-point theorem (also known as the contraction mapping theorem or contractive mapping theorem or Banach–Caccioppoli theorem), that was proved in 1922 by Stefan Banach (1892--1945).

Let (X, d) be a metric space. Then a map T : X → X is called a contraction mapping on X if there exists q ∈ [0, 1) such that \[ d \left( T(x) , T(y) \right) \leqslant q\,d(x, y) \qquad \forall x, y \in X. \]

Banach fixed-point theorem: Let (X, d) be a non-empty complete metric space with contraction mapping f : X → X. Then f admits a unique fixed point x^✶ in X, namely, f(x^✶) = x^✶. Furthermore, x^✶ can be found as follows: starting with an arbitrary element x₀ ∈ X and define a sequence {xk}_k∈ℕ by x_k+1 = f(xk), for k = 0, 1, 2, …. Then \[ \lim_{k\to \infty} {\bf x}_k = {\bf x}^{\ast} . \]

Example 1A: Suppose we want to solve a linear system A x = b, where A is a matrix with spectral radius less than 1.

Upon adding and subtracting the identity matrix, we can reduce this problem to fixed point form \[ \mathbf{x} = \mathbf{T}\,\mathbf{x}, \qquad \mathbf{T}\,\mathbf{x} = \left( \mathbf{I} - \mathbf{A} \right) \mathbf{x} + \mathbf{b} . \] If T is is a contracting map (e.g., ∥I − A∥ < 1 in some norm), then Banach's theorem ensures that the recurrence \[ \mathbf{x}^{(k+1)} = \mathbf{T}\,\mathbf{x}^{(k)} , \qquad k=0,1,2,\ldots , \] converges to the unique fixed point x^✶

, which solves the system.

For instance, \[ \mathbf{A} = \begin{bmatrix} 0.5 & 0.2 \\ 0.1 & 0.4 \end{bmatrix} , \qquad \mathbf{b} = \begin{pmatrix} 1 \\ 2 \end{pmatrix} . \] Fixed point formulation for this case reads: \[ T(\mathbf{x} ) = \left( \mathbf{I} - \mathbf{A} \right) \mathbf{x} + \mathbf{b} . \] Then \[ T(\mathbf{x} ) = \begin{bmatrix} 0.5 & -0.2 \\ -0.1 & 0.6 \end{bmatrix} \mathbf{x} + \begin{pmatrix} 1 \\ 2 \end{pmatrix} . \] Now we check contraction of T; indeed, I − A has spectral radius less than 1, and in the [ℓ^∞ norm:

\[ \| \mathbf{I} - \mathbf{A} \|_{\infty} = \max \left\{ |1 - 0.5| + |0.2 |, |0.1 | + | 1 - 0.4 | \right\} = 0.7 . \] So T is a contraction, and the iteration $ \displaystyle \quad \mathbf{x}^{(k+1)} = T\,\mathbf{x}^{(k)} \quad $ converges to the unique solution.

Example 1B: Let us consider the Fredholm-type integral equation \[ u(x) = \int_0^1 K(x,y) \,u(y) \,{\text d}y + f(x) , \] where K(x, y) is a continuous kernel and f ∈ ℭ([0, 1]). We seek a function u ∈ ℭ([0, 1]) satisfying this integral equation.

We define an operator T on the Banach space ℭ([0, 1]) by \[ \left( T\, u \right) (x) = \int_0^1 K(x,y) \,u(y) \,{\text d}y + f(x) . \] If the kernel K satisfies a norm condition such as \[ \sum_{x \in [0, 1]} \int_0^1 \left\vert K(x,y) \,u(y) \right\vert {\text d}y < L < 1, \] then T is a contraction map. Banach's theorem guarantees a unique fixed point, which solves the integral equation.

In partcular, \[ u(x) = \int_0^1 xy\,u(y)\,{\text d} y + \sin (x). \] This linear in u and the kernel K(x, y) = xy satisfies \[ \sup_{0 \le x \le 1} \int_0^1 \left\vert xy\right\vert {\text d} y = \sup_{0 \le x \le 1} x\,\int_0^1 \left\vert y\right\vert {\text d} y = \sup_{0 \le x \le 1} x \cdot \frac{1}{2} = \frac{1}{2} . \] So T is a contraction on ℭ([0, 1]) with the sup norm, and Banach's theorem guarantees a unique solution in ℭ([0, 1]).

Example 1C: Let (X, d) be a complete metric space, and suppose we have a transformation T : X ↦ X that models a geometric process---say, iterated function system (IFS) used in fractal generation.

If each map in the IFS is a contraction, then the composition T is also a contraction. Banach's theorem ensures the existence of a unique fixed point x^✶ ∈ X, which is the attractor of the system.

The classic constraction of the Cantor set or the Serpinski triangle relies on this pronciple. The fixed point is not a number, but a compact subset of the plane---beautifully illustrating how Banach's theorem governs geometric convergence.

In particular, let us take X = ℝ² with Euclidean metric. We define T : ℝ² ↦ ℝ² by \[ T(x, y) = \left( \frac{1}{2}\, x + 1 , \ \frac{1}{2}\,y \right) . \] Now checking its contraction for any two points (x₁, y₁) and (x₂, y₂) from ℝ², we have \[ \| T\left( x_1 , y_1 \right) - T\left( x_2 , y_2 \right) \| = \left\| \left( \frac{1}{2} \left( x_1 - x_2 \right) , \frac{1}{2} \left( y_1 - y_2 \right) \right) \right\| = \frac{1}{2} \left\| \left( x_1 - x_2 , \ y_1 - y_2 \right) \right\| . \] So T is a contraction with constant ½, and the unique fixed point is \[ \left( x, y \right) = \left( \frac{1}{2}\,x + 1 , \ \frac{1}{2}\,y \right) \quad \Rightarrow \quad x = 2, \ y = 0. \] This example models geometric transformation that shrinks and shifts points toward a fixed location---like a simple affine IFS step. ■

End of Example 1

Let us consider a linear algebra problem A x = b for given vector b and invertible matrix A ∈ ℝ^n×n. This problem can be converted into a fixed point problem by splitting as A = S + (A − S), where S is nonsingular, This yields

\[ \mathbf{x} = \mathbf{H}\,\mathbf{x} + \mathbf{v} , \qquad \mathbf{H} = \mathbf{I} - {\bf S}^{-1} \mathbf{A} , \quad \mathbf{v} = {\bf S}^{-1} \mathbf{b} . \]

Hence, we get the iteration scheme

\begin{equation} \label{EqCon.1} \mathbf{x}^{(k+1)} = \mathbf{H}\,\mathbf{x}^{(k)} + \mathbf{v}, \qquad k=0,1,2,\ldots . %\left( - \mathbf{S}^{-1} \mathbf{A} \right) \mathbf{x}^{(k)} + \mathbf{S}^{-1}\mathbf{b} , \qquad k=0,1,2,\ldots , \end{equation}

If x^✶ ∈ ℝ^n×1 is a solution of A x = b satisfying x^✶ = H x^✶ + v, then the error is e^(k) = x^✶ − x^(k) and the residual is r^(k) = A e^(k). The error term satisfies the recurrence:

\[ \mathbf{e}^{(k+1)} = \mathbf{H}\,\mathbf{e}^{(k)} = \cdots = \mathbf{H}^{k+1} \, \mathbf{e}^{(0)} , \qquad k=0,1,\ldots . \]

Hence, the iterative method \eqref{EqCon.1} is convergent if e^(k) = H^k+1e⁽⁰⁾ → 0 for any e⁽⁰⁾ ∈ ℝ^n×1.

For future consideration, we need to exploit the so-called Jordan normal form of matrix H, namely, H = S⁻¹JS, where S is an invertible matrix and J is a block diagonal matrix consisting of the Jordan blocks, \[ \mathbf{J} = \begin{bmatrix} \fbox{J_1} &&& \\ & \fbox{J_2} && \\ &&\ddots& \\ &&& \fbox{J_m} \end{bmatrix} , \] and each Jordan block has the form:

\[ \mathbf{J}_i = \begin{bmatrix} \lambda_i & 1 &0&\cdots & 0 \\ 0 & \lambda_i & 1 & \cdots & 0 \\ &&&\ddots& \\ 0&0&0& \cdots & \lambda_i \end{bmatrix} \in \mathbb{R}^{n_i \times n_i} . \]

Note that the same eigenvalue may appear in several blocks J_i with different names λ_i. In other words, in a cirtain oblique colomn coordinate system (the column of S) matrix H takes the relatively simple form of a direct sum of say m transformations J_i. We can decompose any column vector x in the same coordinate system, writing x = x₁ + x₂ + ⋯ + x_m, where the term x_i of the sum corresponds to the Jordan block J_i. We may also assume that x_i ≠ 0.

We split every Jordan block into sum of the diagonal matrix and the nilpotent matris, namely, J_i = λ_iI + P. Notice, that P^s = 0 for some positive integer s ≤ n_i.

Then every integer power of H can be written as

\[ \mathbf{H}^k = \left( \mathbf{S}^{-1} \mathbf{H}\,\mathbf{S} \right)^k = \mathbf{S}^{-1} \mathbf{H}^k \mathbf{S} . \]

Since matrix J is blosk diagonal, its power will be

\[ \mathbf{J}^k = \mathbf{S}^{-1} \begin{bmatrix} \fbox{J_1^k} &&& \\ & \fbox{J_2^k} && \\ &&\ddots& \\ &&& \fbox{J_m^k} \end{bmatrix} \mathbf{S} . \]

For every Jordan block, we have

\[ \mathbf{J}_i^k = \left( \lambda_i \mathbf{I} + \mathbf{P} \right)^k = \sum_{j=0}^{n_i -1} \binom{k}{j} \lambda_i^{k - j} \mathbf{P}^j \]

because the powers of nilpotent matrices vanich when this power exceeds the dimension of the Jordan block.

Theorem 2: Let H be an n-by-n matrix. Then H^kz → 0 for arbitrary column vector z if and only if each eigenvalue of H is less than 1 in absolute value.

H^kz diverges when ρ(H) > 1. Here ρ(H) is the largest modulus of the eigenvalues of the matrix H, This number ρ(H) is called the spectral radius of matrix H.

Let λ be an eigenvalue of H, real or complex, such that |λ| = ρ(H) > 1, and let u be a corresponding eigenvector, i.e., H u = λu. Then H^ku = λ^ku, and \[ \| \mathbf{H}^k\mathbf{u}\|_{\infty} = |\lambda |^k \| \mathbf{u} \|_{\infty} > \|\mathbf{u}\|_{\infty} =: \gamma > 0 . \] If u is real, we choose z = u; hence, $ \displaystyle \quad \| \mathbf{H}^k \mathbf{z} \|_{\infty} > \gamma , \quad$ and this cannot tend to zero.

If u is complex, then u = x + ⅉy, with some real vectors x, y. However, then at least one of the sequences (H^kx) or (H^ky) does not tend to zero. For if both do, then also H^ku = H^kx + ⅉH^ky → 0 and this contradicts $ \displaystyle \quad \| \mathbf{H}^k \mathbf{u} \|_{\infty} > \gamma . $

Now let ρ(H) < 1, and assume for simplicity that H possesses n linearly independent eigenvectors (u_j) such that H u_j = λ_ju_j. Linearly independence means that every z ∈ ℝ^n×1 can be expressed as a linear combination of the eigenvectors, i.e., there exist constants c_j ∈ ℂ such that $ \displaystyle \quad \mathbf{z} = \sum_{1\le j \le n} c_j \mathbf{u}_j . \quad $ Hence, \[ \mathbf{H}^k \mathbf{z} = \sum_{j=1}^n c_j \lambda_j^k \mathbf{u}_j , \] and since |λ_j| ≤ ρ(H) < 1, we have $ \displaystyle \quad \lim_{k\to\infty} \mathbf{H}^k \mathbf{z} = 0 , \quad $ as required.

To complete the proof in general case, we need to exploit the so-called Jordan normal form of matrix H, namely, H = S⁻¹JS, where S is an invertible matrix and J is a block diagonal matrix consisting of the Jordan blocks, \[ \mathbf{J} = \begin{bmatrix} \fbox{J_1} &&& \\ & \fbox{J_2} && \\ &&\ddots& \\ &&& \fbox{J_m} \end{bmatrix} , \] \[ \mathbf{J}_i = \begin{bmatrix} \lambda_i & 1 &0&\cdots & 0 \\ 0 & \lambda_i & 1 & \cdots & 0 \\ &&&\ddots& \\ 0&0&0& \cdots & \lambda_i \end{bmatrix} \in \mathbb{R}^{n_i \times n_i} . \] To prove that J^k_i → 0 for |λ_i| < 1, we split the Jordan block into sum of the diagonal matrix and the nilpotent matris, namely, J^k_i = λ_jI + P. Notice, that P^s = 0 for some positive integer s ≤ n_{i_{. Then we evaluate the terms of the binomial expansion

\[
\left( \lambda_i \mathbf{I} + \mathbf{P} \right)^k = \sum_{j=0}^{n_i -1} \binom{k}{j} \lambda_i^{k - j} \mathbf{P}^j .
\]

Since the binomial coefficients are polynomials in k, their product with &lambda^k tends to zero as k → ∞ because |λ| < 1.}}

Example 2: ■

End of Example 2

Corollary 1: For a square matrix H, H^kz converges to some limit vector if and only if all eigenvalues of H are less than 1 in modulus except possible simple eigenvalues λ = 1.

Actually, λ_i = 1 means that A = I − H is singular, and this is associated with the stationary iterative solution of a consistent system A x = b with singular matrix A. However, singular systems A x = b are freqiently solved especially for b = 0 in connection with the eigenvalue problems.

Corollary 2: Let x^✶ be a solution of A x = b satisfying x^✶ = H x^✶ + v. Suppose that we are given an iterative scheme: \[ \mathbf{x}^{(k+1)} = \mathbf{H}\,\mathbf{x}^{(k)} + \mathbf{v} , \qquad \mathbf{x}^{(0)}, \ \mathbf{v} \in \mathbb{R}^{n\times 1} . \] Then x^(k) → x^✶ for any choice of x⁽⁰⁾ if and only if ρ(H) < 1.

Example 18: ■

End of Example 18

Lemma 1: Let A be a singular square matrix having eigenvector x corresponding to the zero eigenvalue. For any (nonsingular) preconditioner S, splitting A = S + (A − S) generates matrix B = I − S⁻¹A that has the same eigenvector x corresponding to eigenvalue λ = 1.

Let x be an eigenvector of matrix A corresponding to its zero eigenvalue, so ${\bf A}\,{\bf x} = {\bf 0}$. Application of the iterative matrix \eqref{Eq1.4} to {\bf x} yields \[ {\bf B}\,{\bf x} = {\bf x} - {\bf S}^{-1} {\bf A} \,{\bf x} = {\bf x} - {\bf S}^{-1} {\bf 0} = {\bf x} . \] Therefore, $\lambda = 1$ is the eigenvalue of matrix {\bf B} to which corresponds the same eigenvector {\bf x} of matrix {\bf A}.

Example 18: ■

End of Example 18

Theorem 3 (V. Dobrushkin, 2025): Let H be a square matrix with a simple eigenvalue λ₁ = 1 corresponding to the eigenvector u₁, and suppose all other eigenvalues of H have modulus strictly less than 1. If β is a basis of ℝ^n×1 that includes u₁,, then any vector z in ℝ^n×1 admits a unique decomposition z = c₁u₁ + ⋯, where the remaining terms lie in the span of eigenvectors associated with eigenvalues of modulus less than 1. Under repeated application of H, the influence of these subdominant components vanishes, and the iterates converge: $ \displaystyle \quad \lim_{k\to\infty} \mathbf{H}^k \mathbf{z} = c_1 \mathbf{u}_1. $

Assume for simplicity that {\bf H} possesses $n$ linear independent eigenvectors $\{ \mathbf{u}_j \}$ such that ${\bf H}\,\mathbf{u}_1 = \mathbf{u}_1$ and ${\bf H}\,\mathbf{u}_j = \lambda_j \mathbf{u}_j$ ($j=2, \ldots , n$ with $|\lambda_j | <1$). Then every vector $\mathbf{z} \in \mathbb{R}^{n\times 1}$ can be uniquely expressed as a linear combination of these eigenvectors. Thus, \[ \mathbf{H}^k \mathbf{z} = c_1 \mathbf{u} + \sum_{2 \le j \le n} c_j \lambda_j^k \mathbf{u}_j . \] Since |λ_j| < 1 for j = 2, 3, … , n, we have $ \displaystyle \quad \lim_{k\to \infty} \mathbf{H}^k \mathbf{z} c_1 \mathbf{u}_1 \quad $ as required.

To complete proof in general case, we need to exploit so-called Jordan normal form of matrix {\bf H}, namely, $\mathbf{H} = \mathbf{S}^{-1}\mathbf{J}\mathbf{S}$, where $\mathbf{J}$ is a block diagonal matrix consisting of the Jordan blocks. The rest of the proof is straight forward and it follows \cite{FW60}, which is based on the property that $\mathbf{J}_i^k \to 0$ for $i=2,3,\ldots , n$.

Example 18: ■

End of Example 18

Darve, E. and Wootters, M., Numerical Linear Algebra with Julia, SIAM, Philadelphia, 2021.
Forsythe, G.E. and Wasow, W.R., Finite Difference Methods for Partial Differential Equations. New York: John Wiley & Sons, Inc., 1960.
Loehr, N., Advanced Linear Algebra, CRC Press, 2014.
Watkins, D.S., Fundamentals of Matrix Computations, Wiley; 3rd edition, 2010.
White, R.E., Computational Linear Algebra with Applications and MATLAB Computations, CRC Press Boca Raton, FL, 2023. doi: 10.1201/9781003304128

Introduction to Linear Algebra

Fundamentals

Conditioning and Stability

Direct Methods

Iterative Methods

Factorizations

SVD

Eigenvalues

Functions of Matrices

Miscellany

Preliminaries

Glossary

Reference