While solving a problem numerically, there are two sources of possible errors. First, there may be errors due to inaccurate input data. Second, there are errors caused by the algorithm itself because of used approximations made within the calculations. In order to estimate the errors in the computed answers from both these sources, we need to understand how much the solutions are changed (in this case we say that they are perturbed) if the input data are slightly perturbed.

Condition Numbers

Let us consider a linear system of equations that we write in concise form A x = b, where A is a square invertible matrix (this condition guarantees a unique solution), x is a column vector of unknowns, and b is a given column vector. Now suppose we add a small vector δb to b and consider the perturbed system A z = b + δb. This system also has a unique solution z, which we hope is not far away from x. Let δx denote the difference between z and x, so z = x + δx.

In order to quantify the size of vectors, we introduce vector norm, ∥⋅∥. It does not matter which norm you use---all norms are equivalent in finite dimensional vector spaces. We mostly will use the Euclidean norm, denoted ∥⋅∥_₂. It make sense to speak about small quantity δx by using relative terms: when we say that δx is small, we mean that it is small in comparison with x. Then its size δx relative to x is given by ∥δx∥ / ∥x∥, and the size of δb is given by ∥δb∥ / ∥b∥

The equations A x = b and A(x + δx) = b + δb imply that A(δx) = δb. Since matrix A is invertible, we get δx = A⁻¹(δb). Whatever vector norm has been chosen, we can use the induced matrix norm to measure matrices. The latter equation leads to

\[ \| \delta\mathbf{x} \| \leqslant \left\| {\bf A}^{-1} \right\| \, \| \delta{\bf b} \| . \]

where matrix norm is

\[ \| \mathbf{x} \| = \max_{{\bf x} \ne 0} \frac{\left\| {\bf A}\,\mathbf{x} \right\|}{\| {\bf x} \|} . \]

Similarly, from equation b = A x, we get the inequality ∥b∥ &leq ∥A∥ ∥x∥, which is equivalent to

\[ \frac{1}{\| \mathbf{x} \|} \leqslant \left\| {\bf A} \right\| \, \frac{1}{\| {\bf b} \|} , \qquad \mathbf{x} \ne \mathbf{0} . \]

Multiplying these two inequalities, we obtain

\[ \frac{\| \delta\mathbf{x} \|}{\| \mathbf{x} \|} \leqslant \left\| {\bf A} \right\| \,\left\| {\bf A}^{-1} \right\| \frac{\| \delta\mathbf{b} \|}{\| \mathbf{b} \|} . \]

The latter provides a bound for ∥δx∥ / ∥x∥ in terms of ∥δb∥ / ∥b∥. The factor ∥A∥ ∥A⁻¹∥ is called the condition number of A and denoted as

\begin{equation} \label{EqCond.1} \kappa (\mathbf{A}) = \| \mathbf{A} \|\,\left\| \mathbf{A}^{-1} \right\| = \frac{M}{m} , \end{equation}

where

\[ M = \| \mathbf{A} \| = \max_{{\bf x} \ne 0} \frac{\left\| {\bf A}\,\mathbf{x} \right\|}{\| {\bf x} \|} , \qquad m = \min_{{\bf x} \ne 0} \frac{\left\| {\bf A}\,\mathbf{x} \right\|}{\| {\bf x} \|} = \min_{{\bf y} \ne 0} \frac{\left\| \mathbf{y} \right\|}{\| {\bf A}^{-1} {\bf y} \|} = \frac{1}{\left\| \mathbf{A}^{-1} \right\|} . \]

This allows us to rewrite the last inequality in succinct form

\begin{equation} \label{EqCond.2} \frac{\| \delta\mathbf{x} \|}{\| \mathbf{x} \|} \leqslant \kappa (\mathbf{A})\, \frac{\| \delta\mathbf{b} \|}{\| \mathbf{b} \|} . \end{equation}

Till some extend, the condition number is (very roughly) the rate at which the solution x will change with respect to a change in b. From this inequality \eqref{EqCond.2}, it follows that if κ(A) is not too large, then small (relative) perturbation in the coefficient b imply small values of ∥δx∥ / ∥x∥. That is, the linear system A x = b is not overly sensitive to perturbation in input data b. As a rule of thumb, if the condition number κ(A) = 10^k, then you may lose up to k digits of accuracy on top of what would be lost to the numerical method due to loss of precision from arithmetic method.

If condition number of matrix A is not too large, then the matrix is called well-conditioned. Otherwise, a small value of ∥δb∥ / ∥b∥ does not guarantee that ∥δx∥ / ∥x∥ will be small. This means that the system A x = b is potentially very sensitive to perturbations of b. Thus, if κ(A) is large, matrix A is said to be ill-conditioned.

Theorem 1: For any induced matrix norm and for any square matrix A, κ(A) ≥ 1.

Since the identity matrix is equal to I = A⁻¹A, we get 1 = ∥I∥ ≤ ∥A⁻¹∥ ∥A∥ = κ(A).

Example 1: Let us consider the following matrix \[ \mathbf{A} = \begin{bmatrix} 4.1 & 1.4 \\ 9.7 & 3.3 \end{bmatrix} . \] Let us take input data b to be the second column of A, so the solution to A x = b is simply x = [0, 1]^{T.

A = {{4.1, 1.4}, {9.7, 3.3}};

b = {1.4, 3.3};

LinearSolve[A, b]

{0., 1.}

Now add 0.01 to the first component of b.

b2 = {1.41, 3.3};

LinearSolve[A, b2]

{-0.66, 2.94}

The solution changes dramatically. To calculate the condition number, we enter in Mathematica notebook:

SingularValueList[A]

{11.1243, 0.00449467}

Then ration of largest and smallest singular values gives us the required numerical value

11.124296822630027/0.0044946661166292015

2475

Since the condition number κ(A) = 2475 is large, we claim that the given matrix is ill-conditioned.

With Mathematica, we verify formula (1).

A = {{4.1, 1.4}, {9.7, 3.3}};

AI = Inverse[A]

{{-66., 28.}, {194., -82.}}

The condition number with 2-norm is

kappa2 = Norm[A, 2] * Norm[AI, 2]

2475

It is the same with Frobenius norm:

kappaF = Norm[A, Frobenius] * Norm[AI, Frobenius]

2475

With infinity norm, we have

kappaI = Norm[A, Infinity] * Norm[AI, Infinity]

3588

It is the same as 1-norm

kappa1 = Norm[A, 1] * Norm[AI, 1]

3588

■
End of Example 1}

Of course, the condition number depends on the choice of vector norm---they all are equivalent, so these norms differ by a nonzero constant multiple. The most appropriate for our needs is the Euclidean norm, or 2-norm inherited from Hilbert space ℓ², which leads to

\[ \kappa (\mathbf{A}) = \kappa_2 (\mathbf{A}) = \left\| \mathbf{A}^{-1} \right\|_2 \| \mathbf{A} \|_2 = \frac{\sigma_{\max} (\mathbf{A})}{\sigma_{\min} (\mathbf{A})} , \]

where σ_max and σ_min are maximal and minimal singular values of A, respectively. Recall that the spectral norm (or 2-norm) of matrix A is

\[ \| \mathbf{A} \|_2 = + \sqrt{\lambda_{\max} (\mathbf{A}^{\ast} \mathbf{A})} = \sigma_{\max} (\mathbf{A}) . \]

Hence, when A is normal (comutes with its adjoint) or positive deﬁnite symmetric matrix, then

\[ \kappa (\mathbf{A}) = \frac{\left\vert \lambda_{\max} (\mathbf{A}) \right\vert}{\left\vert \lambda_{\min} (\mathbf{A}) \right\vert} , \]

where λ_max and λ_min are maximal and minimal (by module) eigenvalues of A, respectively. For vector 1-norm, the corresponding matrix norm (A = [𝑎_i,j]) is maximum absolute column sum

\[ \kappa_1 (\mathbf{A}) = \max_j \,\sum_{1 \le i \le n} \left\vert a_{i,j} \right\vert . \]

Matrix norm corresponding to vector (Chebyshev) ∞-norm is maximum absolute row sum,

\[ \kappa_{\infty} (\mathbf{A}) = \max_i \,\sum_{1 \le j \le n} \left\vert a_{i,j} \right\vert . \]

The Frobenius norm of matrix A is

\[ \| \mathbf{A} \|_F = \sqrt{\sum_1^m \sum_1^n \left\vert a_{i,j} \right\vert^2} = \sqrt{\mbox{trace} \left( \mathbf{A}^{\ast} \mathbf{A} \right)} = +\sqrt{\sum_i \sigma_i^2 (\mathbf{A})} . \]

There are many diﬀerent condition numbers. In general, a condition number applies not only to a particular matrix, but also to the problem being solved. For instance, let f(x) be a real-valued differentiate function of a real variable x. Suppose we are given f(x + δx) for some perturbed value of x instead of required value f(x). The best that we can do (without more information) is to try to bound the absolute error |f(x + δx) − f(x)|. We may use a simple linear approximation to f to get the estimate f(x + δx) ≈ f(x) + δx f'(x) and so the error becomes |f(x + δx) − f(x)| ≈ |δx| ⋅ |f′(x)|. We call |f′(x)| the absolute condition number of f at x. If |f′(x)| is large enough, then the error may be large even if δx is small; in this case, we call f ill-conditioned at x.

We can similarly estimate the relative error and define the relative condition number (or often just condition number for short).

\[ \frac{\left\vert f(x + \delta x) - f(x) \right\vert}{\left\vert f(x) \right\vert} \approx \frac{\left\vert \delta x \right\vert}{\left\vert x \right\vert} \cdot \frac{\left\vert f' (x) \right\vert \cdot |x|}{\left\vert f(x) \right\vert} . \]

The condition number κ(A) also appears in the bound for how much a change E = δA in a matrix A can aﬀect its inverse.

\[ \frac{\left\| (\mathbf{A} + \mathbf{E})^{-1} - \mathbf{A}^{-1} \right\|}{\left\| \mathbf{A}^{-1} \right\|} \leqslant \kappa (\mathbf{A}) \, \frac{\left\| \mathbf{E} \right\|}{\left\| \mathbf{A} \right\|} . \]

================== check

The condition number of a matrix A, typically denoted \kappa(A), measures how sensitive the solution x of the linear system Ax = b is to small changes or errors in the input data. It quantifies the worst-case amplification of relative errors from input to output. If \kappa(A) is close to 1, the system is said to be well-conditioned, meaning that small perturbations in A or b lead to proportionally small changes in the solution. Conversely, a large condition number indicates an ill-conditioned system, where even tiny errors can cause significant deviations in the solution.

Historically, the concept of conditioning emerged in the early days of numerical analysis, particularly through the work of Alan Turing and James H. Wilkinson. Turing, in his 1948 paper on rounding errors in matrix computations, was among the first to formalize the idea that certain problems are inherently unstable under finite precision arithmetic. Wilkinson later expanded this framework in the 1960s, emphasizing the role of condition numbers in understanding numerical stability and error propagation. His work on backward error analysis and the sensitivity of eigenvalue problems remains foundationa

Pedagogically, it’s helpful to interpret \kappa(A) as a bridge between theory and computation. For example, in the case of solving A x = b, the condition number in the 2-norm is given by: \[ \kappa_2(A) = \|A\|_2 \cdot \|A^{-1}\|_2 \] This expression captures how much the geometry of the transformation A distorts space. A matrix that stretches some directions much more than others will have a large condition number, reflecting potential instability. In practice, systems with \kappa(A) \lesssim 10^2 are often considered numerically safe, while those with \kappa(A) \gg 10^6 may require regularization, preconditioning, or reformulation.

Understanding condition numbers is essential not only for solving linear systems but also for interpreting the reliability of computed solutions in optimization, differential equations, and data science. It’s a cornerstone of numerical literacy.

Example 1: The Hilbert matrix H_n, defined by H_{ij} = \frac{1}{i + j - 1}, is a textbook example of an ill-conditioned matrix. Despite being symmetric and positive definite, its condition number grows exponentially with n. For instance: \kappa(H_5) \approx 4.8 \times 10^5 \kappa(H_{10}) \approx 1.6 \times 10^{13} This example was famously studied by Wilkinson, who showed that solving H_n x = b using Gaussian elimination without pivoting leads to catastrophic error propagation. It’s a powerful demonstration of how conditioning—not just algorithmic stability—determines numerical reliability ■

End of Example 1

Example 2: In solving overdetermined systems Ax \approx b via least squares, the condition number of A^T A is roughly the square of that of A: \kappa(A^T A) \approx \kappa(A)^2 This squaring effect means that even moderately ill-conditioned data matrices can lead to highly unstable normal equations. That’s why QR decomposition or SVD is preferred in practice—they avoid squaring the condition number and offer better numerical stability. ■

End of Example 2

Example 3: In modern machine learning, especially deep learning, condition numbers appear in the analysis of Hessians and Jacobians. Poor conditioning of the loss landscape can slow convergence or trap optimization in saddle points. Techniques like batch normalization, adaptive learning rates, and preconditioning are partly motivated by the need to improve conditioning. ■

End of Example 3

Example 4: In computational physics or engineering, condition numbers quantify how sensitive a model is to measurement errors. For example, in finite element simulations, poorly conditioned stiffness matrices can arise from mesh irregularities or material contrasts. Engineers often use scaling or domain decomposition to mitigate this. ■

End of Example 4

Example 5: For a matrix A, the condition number in the 2-norm equals the ratio of the largest to smallest singular value: \kappa_2(A) = \frac{\sigma_{\max}}{\sigma_{\min}} This means that A stretches space unevenly—some directions are amplified, others compressed. If \sigma_{\min} is close to zero, small perturbations can flip the solution dramatically. This geometric view is especially useful when teaching students about the stability of linear transformations. ■

End of Example 5

Example: Sensitivity to Perturbation 1. Well-Conditioned System Let’s take: A = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, \quad b = \begin{bmatrix} 1 \\ 1 \end{bmatrix} The solution is clearly x = A^{-1}b = b = \begin{bmatrix} 1 \\ 1 \end{bmatrix} The condition number \kappa_2(A) = 1 (perfectly conditioned) Now perturb b slightly: \[ \tilde{b} = \begin{bmatrix} 1.01 \\ 0.99 \end{bmatrix} \Rightarrow \tilde{x} = A^{-1} \tilde{b} = \tilde{b} \] Relative change in b: \frac{\|\tilde{b} - b\|_2}{\|b\|_2} = \frac{\sqrt{(0.01)^2 + (-0.01)^2}}{\sqrt{1^2 + 1^2}} = \frac{\sqrt{2} \cdot 0.01}{\sqrt{2}} = 0.01 Relative change in x: same as above, 1% The output changes proportionally to the input — stable behavior. 2. Ill-Conditioned System Now let: \[ A = \begin{bmatrix} 1 & 1 \\ 1 & 1.0001 \end{bmatrix}, \quad b = \begin{bmatrix} 2 \\ 2.0001 \end{bmatrix} \] The exact solution is x = \begin{bmatrix} 1 \\ 1 \end{bmatrix} But \kappa_2(A) \approx 20001 — very ill-conditioned Now perturb b slightly: \tilde{b} = \begin{bmatrix} 2 \\ 2.0002 \end{bmatrix} Solving A \tilde{x} = \tilde{b}, we get: \tilde{x} \approx \begin{bmatrix} 0 \\ 2 \end{bmatrix} Relative change in b: \frac{\|\tilde{b} - b\|_2}{\|b\|_2} \approx \frac{0.0001}{\sqrt{2^2 + 2.0001^2}} \approx 0.000035 Relative change in x: \frac{\|\tilde{x} - x\|_2}{\|x\|_2} = \frac{\sqrt{1^2 + 1^2}}{\sqrt{1^2 + 1^2}} = 1 ⚠️ A tiny perturbation in b (0.0035%) caused a 100% change in the solution — a dramatic amplification due to the high condition number. 🧠 Takeaway This example shows that even when using a stable algorithm (like Gaussian elimination with partial pivoting), the inherent conditioning of the matrix governs how trustworthy the solution is. In ill-conditioned systems, numerical solutions can be wildly inaccurate even with tiny input errors — a key insight from Wilkinson’s work. import numpy as np # Ill-conditioned matrix A = np.array([[1, 1], [1, 1.0001]]) b = np.array([2, 2.0001]) b_perturbed = np.array([2, 2.0002]) # Solve original and perturbed systems x = np.linalg.solve(A, b) x_perturbed = np.linalg.solve(A, b_perturbed) # Condition number cond_A = np.linalg.cond(A) # Relative changes rel_change_b = np.linalg.norm(b_perturbed - b) / np.linalg.norm(b) rel_change_x = np.linalg.norm(x_perturbed - x) / np.linalg.norm(x) # Output print("Condition number of A:", cond_A) print("Original solution x:", x) print("Perturbed solution x_perturbed:", x_perturbed) print("Relative change in b:", rel_change_b) print("Relative change in x:", rel_change_x) import numpy as np # Ill-conditioned matrix A = np.array([[1, 1], [1, 1.0001]]) b = np.array([2, 2.0001]) b_perturbed = np.array([2, 2.0002]) # Solve original and perturbed systems x = np.linalg.solve(A, b) x_perturbed = np.linalg.solve(A, b_perturbed) # Condition number cond_A = np.linalg.cond(A) # Relative changes rel_change_b = np.linalg.norm(b_perturbed - b) / np.linalg.norm(b) rel_change_x = np.linalg.norm(x_perturbed - x) / np.linalg.norm(x) # Output print("Condition number of A:", cond_A) print("Original solution x:", x) print("Perturbed solution x_perturbed:", x_perturbed) print("Relative change in b:", rel_change_b) print("Relative change in x:", rel_change_x) import numpy as np # Ill-conditioned matrix A = np.array([[1, 1], [1, 1.0001]]) b = np.array([2, 2.0001]) b_perturbed = np.array([2, 2.0002]) # Solve original and perturbed systems x = np.linalg.solve(A, b) x_perturbed = np.linalg.solve(A, b_perturbed) # Condition number cond_A = np.linalg.cond(A) # Relative changes rel_change_b = np.linalg.norm(b_perturbed - b) / np.linalg.norm(b) rel_change_x = np.linalg.norm(x_perturbed - x) / np.linalg.norm(x) Output print("Condition number of A:", cond_A) print("Original solution x:", x) print("Perturbed solution x_perturbed:", x_perturbed) print("Relative change in b:", rel_change_b) print("Relative change in x:", rel_change_x) % Ill-conditioned matrix A = [1, 1; 1, 1.0001]; b = [2; 2.0001]; b_perturbed = [2; 2.0002]; % Solve original and perturbed systems x = A \ b; x_perturbed = A \ b_perturbed; % Condition number cond_A = cond(A); % Relative changes rel_change_b = norm(b_perturbed - b) / norm(b); rel_change_x = norm(x_perturbed - x) / norm(x); % Output fprintf('Condition number of A: %.2f\n', cond_A); fprintf('Original solution x: [%f, %f]\n', x(1), x(2)); fprintf('Perturbed solution x_perturbed: [%f, %f]\n', x_perturbed(1), x_perturbed(2)); fprintf('Relative change in b: %.6f\n', rel_change_b); fprintf('Relative change in x: %.6f\n', rel_change_x);

Show that κ(A) = κ(A⁻¹)
Show that for any nonzero scalar c, κ(cA) = κ(A).

Introduction to Linear Algebra

Fundamentals

Computations

Direct Methods

Iterative Methods

Eigenvalues

Orthogonality

Matrix Algebra

Least Squares

Miscellany

Preliminaries

Glossary

Reference

Condition Numbers