Chapter 2

Linear Equations

Matrices were invented to solve systems of equations. One compact equation, Ax = b, replaces any number of simultaneous equations — and one algorithm solves them all.

From a system to a single equation

Take three equations in three unknowns:

\begin{aligned} 2x + y - z &= 8 \\ -3x - y + 2z &= -11 \\ -2x + y + 2z &= -3 \end{aligned}

Strip out the coefficients into a matrix \(A\), the unknowns into a vector \(x\), and the right-hand sides into a vector \(b\):

\underbrace{\begin{bmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} x \\ y \\ z \end{bmatrix}}_{x} = \underbrace{\begin{bmatrix} 8 \\ -11 \\ -3 \end{bmatrix}}_{b}

The entire system is now \(Ax = b\). This is more than notation. It reframes the question geometrically: which vector \(x\) does the matrix \(A\) transform into \(b\)?

Gaussian elimination

The universal solving algorithm. Three row operations never change the solution set:

Operation	Example
Swap two rows	R₁ ↔ R₂
Multiply a row by a nonzero constant	R₂ → 3R₂
Add a multiple of one row to another	R₂ → R₂ + 1.5R₁

Apply them until the matrix is in row echelon form — a staircase of zeros below the diagonal — then back-substitute from the bottom up.

Worked example

Solving the system above. We work on the augmented matrix \([A \mid b]\):

\left[\begin{array}{ccc|c} 2 & 1 & -1 & 8 \\ -3 & -1 & 2 & -11 \\ -2 & 1 & 2 & -3 \end{array}\right]

Step 1. Eliminate the first column below the pivot. \(R_2 \to R_2 + \tfrac{3}{2}R_1\) and \(R_3 \to R_3 + R_1\):

\left[\begin{array}{ccc|c} 2 & 1 & -1 & 8 \\ 0 & \tfrac{1}{2} & \tfrac{1}{2} & 1 \\ 0 & 2 & 1 & 5 \end{array}\right]

Step 2. Eliminate the second column below the pivot. \(R_3 \to R_3 - 4R_2\):

\left[\begin{array}{ccc|c} 2 & 1 & -1 & 8 \\ 0 & \tfrac{1}{2} & \tfrac{1}{2} & 1 \\ 0 & 0 & -1 & 1 \end{array}\right]

Step 3. Back-substitute. The last row says \(-z = 1\), so \(z = -1\). The second row: \(\tfrac{1}{2}y + \tfrac{1}{2}(-1) = 1\), so \(y = 3\). The first row: \(2x + 3 - (-1) = 8\), so \(x = 2\).

x = 2, \quad y = 3, \quad z = -1

When does a solution exist?

Every system \(Ax = b\) lands in exactly one of three cases, and the determinant tells you which is possible:

Case	Condition	Geometry (3 equations)
Unique solution	det(A) ≠ 0	Three planes meet at a single point
Infinitely many	det(A) = 0, system consistent	Planes intersect in a shared line or plane
No solution	det(A) = 0, system inconsistent	Planes never share a common point

The inverse method

When \(A\) is invertible, the solution can be written in one line — multiply both sides by \(A^{-1}\):

Ax = b \quad\Rightarrow\quad x = A^{-1}b

Elegant on paper, but in practice software almost never computes \(A^{-1}\) explicitly — elimination (LU decomposition) is faster and numerically safer. The inverse is a concept you reason with, not an algorithm you run.

Why AI cares

Training a machine learning model is, at heart, solving for unknowns that satisfy a system of constraints. Linear regression — the ancestor of every neural network — has an exact matrix solution called the normal equation:

\theta = (X^T X)^{-1} X^T y

Every symbol there is an operation from Chapter 1: transpose, multiplication, inverse. When the system becomes too large for exact solutions, AI switches to iterative approximation (gradient descent) — but the unknowns, the data, and the gradients all remain matrices. More on this in Matrices in AI.