From a system to a single equation
Take three equations in three unknowns:
Strip out the coefficients into a matrix \(A\), the unknowns into a vector \(x\), and the right-hand sides into a vector \(b\):
The entire system is now \(Ax = b\). This is more than notation. It reframes the question geometrically: which vector \(x\) does the matrix \(A\) transform into \(b\)?
Gaussian elimination
The universal solving algorithm. Three row operations never change the solution set:
| Operation | Example |
|---|---|
| Swap two rows | R₁ ↔ R₂ |
| Multiply a row by a nonzero constant | R₂ → 3R₂ |
| Add a multiple of one row to another | R₂ → R₂ + 1.5R₁ |
Apply them until the matrix is in row echelon form — a staircase of zeros below the diagonal — then back-substitute from the bottom up.
Worked example
Solving the system above. We work on the augmented matrix \([A \mid b]\):
Step 1. Eliminate the first column below the pivot. \(R_2 \to R_2 + \tfrac{3}{2}R_1\) and \(R_3 \to R_3 + R_1\):
Step 2. Eliminate the second column below the pivot. \(R_3 \to R_3 - 4R_2\):
Step 3. Back-substitute. The last row says \(-z = 1\), so \(z = -1\). The second row: \(\tfrac{1}{2}y + \tfrac{1}{2}(-1) = 1\), so \(y = 3\). The first row: \(2x + 3 - (-1) = 8\), so \(x = 2\).
When does a solution exist?
Every system \(Ax = b\) lands in exactly one of three cases, and the determinant tells you which is possible:
| Case | Condition | Geometry (3 equations) |
|---|---|---|
| Unique solution | det(A) ≠ 0 | Three planes meet at a single point |
| Infinitely many | det(A) = 0, system consistent | Planes intersect in a shared line or plane |
| No solution | det(A) = 0, system inconsistent | Planes never share a common point |
The inverse method
When \(A\) is invertible, the solution can be written in one line — multiply both sides by \(A^{-1}\):
Elegant on paper, but in practice software almost never computes \(A^{-1}\) explicitly — elimination (LU decomposition) is faster and numerically safer. The inverse is a concept you reason with, not an algorithm you run.
Why AI cares
Training a machine learning model is, at heart, solving for unknowns that satisfy a system of constraints. Linear regression — the ancestor of every neural network — has an exact matrix solution called the normal equation:
Every symbol there is an operation from Chapter 1: transpose, multiplication, inverse. When the system becomes too large for exact solutions, AI switches to iterative approximation (gradient descent) — but the unknowns, the data, and the gradients all remain matrices. More on this in Matrices in AI.