Everything You Must Know — Before, During & After Eigenvalues & Eigenvectors
These are the building blocks. Without them, eigenvalues won't make sense. Think of it like needing to know the alphabet before reading a novel.
Why? Because eigenvalues ARE scalars, eigenvectors ARE vectors, and everything operates on matrices.
What you need to know:
Scalar: A single number, like \(5\) or \(-3.2\) or \(\pi\). In eigenvalue problems, \(\lambda\) is a scalar.
Vector: An ordered list of numbers. In 2D: \(\mathbf{v} = \begin{pmatrix} 3 \\ -1 \end{pmatrix}\). Think of it as an arrow from the origin to the point \((3, -1)\). Vectors have both magnitude (length) and direction.
Matrix: A rectangular grid of numbers. An \(m \times n\) matrix has \(m\) rows and \(n\) columns. Eigenvalues only apply to square matrices (\(n \times n\)).
Key types: Identity matrix \(\mathbf{I}\) (1's on diagonal, 0's elsewhere), Zero matrix \(\mathbf{0}\), Diagonal matrix, Symmetric matrix (\(\mathbf{A} = \mathbf{A}^T\)), Transpose (\(\mathbf{A}^T\) = rows become columns).
Why? You need these to compute \(\mathbf{A}\mathbf{v}\), \(\mathbf{A} - \lambda\mathbf{I}\), and verify eigenvector solutions.
What you need to know:
Matrix Addition: Add corresponding entries. Both matrices must be the same size.
Scalar Multiplication: Multiply every entry by the scalar. When we write \(\lambda\mathbf{v}\), we scale every component of \(\mathbf{v}\) by \(\lambda\).
Matrix-Vector Multiplication: This is the heart of eigenvalue problems. \(\mathbf{A}\mathbf{v}\) produces a new vector:
Each row of \(\mathbf{A}\) gets "dotted" with \(\mathbf{v}\). The result is a new vector.
Matrix-Matrix Multiplication: \((\mathbf{A}\mathbf{B})_{ij} = \text{row } i \text{ of } \mathbf{A} \cdot \text{col } j \text{ of } \mathbf{B}\). Requires: columns of \(\mathbf{A}\) = rows of \(\mathbf{B}\). Not commutative: \(\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}\) in general!
Dot Product: \(\mathbf{u} \cdot \mathbf{v} = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n\). If \(\mathbf{u} \cdot \mathbf{v} = 0\), the vectors are orthogonal (perpendicular).
Matrix Inverse: \(\mathbf{A}^{-1}\) exists iff \(\det(\mathbf{A}) \neq 0\). Then \(\mathbf{A}\mathbf{A}^{-1} = \mathbf{I}\). For 2×2:
Why? The characteristic equation IS a determinant: \(\det(\mathbf{A} - \lambda\mathbf{I}) = 0\). You cannot find eigenvalues without this.
What you need to know:
2×2 Determinant:
3×3 Determinant: Use cofactor expansion along any row or column. Most commonly along the first row:
Key properties: \(\det(\mathbf{A}\mathbf{B}) = \det(\mathbf{A}) \cdot \det(\mathbf{B})\). If \(\det(\mathbf{A}) = 0\), then \(\mathbf{A}\) is singular (not invertible, has eigenvalue 0). Swapping two rows flips the sign. Multiplying a row by \(k\) multiplies the determinant by \(k\).
Connection to eigenvalues: \(\det(\mathbf{A}) = \lambda_1 \cdot \lambda_2 \cdots \lambda_n\) (the product of all eigenvalues).
Why? Finding eigenvectors requires solving \((\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}\), which is a system of linear equations.
What you need to know:
A system like:
can be written as \(\mathbf{A}\mathbf{x} = \mathbf{b}\). When \(\mathbf{b} = \mathbf{0}\), it's called a homogeneous system — and that's exactly what we solve for eigenvectors.
Gaussian Elimination (Row Reduction): Transform the augmented matrix to Row Echelon Form (REF) or Reduced Row Echelon Form (RREF) using three operations: (1) swap rows, (2) multiply a row by a nonzero scalar, (3) add a multiple of one row to another.
Free variables: Variables that don't correspond to a pivot are "free" — you choose their values. In eigenvector problems, there's always at least one free variable (that's why eigenvectors aren't unique — any scalar multiple works).
Why? Eigenspaces are subspaces. Understanding the concept deepens your knowledge beyond "just computing."
What you need to know:
A vector space is a set of vectors that is closed under addition and scalar multiplication. \(\mathbb{R}^n\) is the most common example.
A subspace is a subset that is itself a vector space. Examples: any line through the origin in \(\mathbb{R}^2\), any plane through the origin in \(\mathbb{R}^3\).
The set of all eigenvectors for a given eigenvalue \(\lambda\) (plus the zero vector) forms a subspace called the eigenspace \(E_\lambda\).
Four Fundamental Subspaces (Strang's framework): Column space, Null space, Row space, Left null space. The null space of \((\mathbf{A} - \lambda\mathbf{I})\) is the eigenspace for \(\lambda\).
Why? You need to determine if eigenvectors are linearly independent (required for diagonalization).
Linear Independence: Vectors \(\{\mathbf{v}_1, \ldots, \mathbf{v}_k\}\) are linearly independent if the only solution to \(c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k = \mathbf{0}\) is \(c_1 = \cdots = c_k = 0\). No vector can be written as a combination of the others.
Key Fact: Eigenvectors corresponding to distinct eigenvalues are always linearly independent. This is why a matrix with \(n\) distinct eigenvalues is always diagonalizable.
Span: The set of all possible linear combinations of a set of vectors.
Basis: A linearly independent set that spans the whole space. A basis for \(\mathbb{R}^n\) has exactly \(n\) vectors. If \(n\) eigenvectors form a basis, the matrix is diagonalizable.
Dimension: The number of vectors in a basis. The dimension of an eigenspace = geometric multiplicity of that eigenvalue.
Why? A matrix IS a linear transformation. Eigenvectors are the directions that the transformation merely scales.
A function \(T: \mathbb{R}^n \to \mathbb{R}^m\) is linear if: \(T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v})\) and \(T(c\mathbf{u}) = cT(\mathbf{u})\).
Every linear transformation can be represented by a matrix: \(T(\mathbf{v}) = \mathbf{A}\mathbf{v}\).
Key geometric transformations:
| Transformation | Matrix Example (2D) | Eigenvalues |
|---|---|---|
| Scaling by \(k\) | \(\begin{pmatrix}k & 0 \\ 0 & k\end{pmatrix}\) | \(\lambda = k\) (double) |
| Stretch x-axis by 2 | \(\begin{pmatrix}2 & 0 \\ 0 & 1\end{pmatrix}\) | \(\lambda = 2, 1\) |
| Reflection over x-axis | \(\begin{pmatrix}1 & 0 \\ 0 & -1\end{pmatrix}\) | \(\lambda = 1, -1\) |
| 90° rotation | \(\begin{pmatrix}0 & -1 \\ 1 & 0\end{pmatrix}\) | \(\lambda = \pm i\) (complex) |
| Shear | \(\begin{pmatrix}1 & k \\ 0 & 1\end{pmatrix}\) | \(\lambda = 1\) (double) |
| Projection onto x-axis | \(\begin{pmatrix}1 & 0 \\ 0 & 0\end{pmatrix}\) | \(\lambda = 1, 0\) |
Why? The null space of \((\mathbf{A} - \lambda\mathbf{I})\) gives you the eigenspace. The rank tells you the dimension of the eigenspace.
Rank: The number of linearly independent rows (or columns) = number of pivots after row reduction.
Null Space (Kernel): The set of all \(\mathbf{v}\) such that \(\mathbf{A}\mathbf{v} = \mathbf{0}\). The eigenspace for \(\lambda\) IS the null space of \((\mathbf{A} - \lambda\mathbf{I})\).
Nullity: Dimension of the null space.
Rank-Nullity Theorem:
This tells you: if the rank of \((\mathbf{A} - \lambda\mathbf{I})\) is \(r\), then the eigenspace has dimension \(n - r\).
Covered in your main tutorial. This is the foundation of everything.
The characteristic polynomial has degree \(n\) for an \(n \times n\) matrix. By the Fundamental Theorem of Algebra, it always has exactly \(n\) roots (counting multiplicity, possibly complex).
Cayley-Hamilton Theorem: Every matrix satisfies its own characteristic equation. If \(p(\lambda) = \lambda^2 - 7\lambda + 10\), then \(\mathbf{A}^2 - 7\mathbf{A} + 10\mathbf{I} = \mathbf{0}\).
The eigenspace is a subspace. Its dimension (geometric multiplicity) can be 1 to algebraic multiplicity. When geometric < algebraic multiplicity, the matrix is NOT diagonalizable.
A matrix is diagonalizable iff it has \(n\) linearly independent eigenvectors. Then \(\mathbf{P}\) has eigenvectors as columns and \(\mathbf{D}\) is diagonal with eigenvalues. This makes computing \(\mathbf{A}^k\) and \(e^{\mathbf{A}t}\) trivial.
When is a matrix diagonalizable? (a) \(n\) distinct eigenvalues → always. (b) Symmetric → always. (c) Geometric multiplicity = algebraic multiplicity for all eigenvalues → yes.
If \(\mathbf{A} = \mathbf{A}^T\) (symmetric), then:
This is the theoretical backbone of PCA. Covariance matrices are symmetric, so PCA always works cleanly.
A symmetric matrix \(\mathbf{A}\) is:
Why this matters: Covariance matrices are always positive semi-definite. In optimization, positive definite Hessians mean you're at a minimum. This connects eigenvalues to convexity and machine learning loss landscapes.
Knowledge needed: Variance, covariance, covariance matrix, standardization (z-scores), matrix multiplication, eigendecomposition.
The PCA Pipeline:
Related methods: Factor Analysis, Independent Component Analysis (ICA), t-SNE, UMAP (nonlinear alternatives).
The generalization of eigendecomposition to any matrix (including non-square). Left singular vectors = eigenvectors of \(\mathbf{A}\mathbf{A}^T\). Right singular vectors = eigenvectors of \(\mathbf{A}^T\mathbf{A}\). Singular values = square roots of eigenvalues of \(\mathbf{A}^T\mathbf{A}\).
Applications: Image compression, recommendation systems (Netflix), NLP (LSA), pseudoinverse, low-rank approximation (Eckart-Young theorem).
Knowledge needed: Probability, transition matrices (columns sum to 1), steady-state concept.
The steady state \(\boldsymbol{\pi}\) satisfies \(\mathbf{P}\boldsymbol{\pi} = \boldsymbol{\pi}\), which is an eigenvector equation with \(\lambda = 1\). The Perron-Frobenius theorem guarantees that any positive stochastic matrix has a unique steady state.
Applications: PageRank, weather modeling, population genetics, queueing theory, financial modeling.
Knowledge needed: Calculus (derivatives), exponential functions, systems of ODEs.
Stability classification: All \(\text{Re}(\lambda_i) < 0\) → stable node/spiral. Mixed signs → saddle point. All positive → unstable. Imaginary parts → oscillation. This is the foundation of control theory, ecology models, and circuit analysis.
Knowledge needed: Graph basics (nodes, edges, adjacency matrix), Laplacian matrix.
Key matrices:
Applications: Spectral clustering, community detection, graph partitioning, network robustness analysis.
A quadratic form \(f(\mathbf{x}) = \mathbf{x}^T\mathbf{A}\mathbf{x}\) appears everywhere in optimization (least squares, SVM, neural networks).
Rayleigh Quotient:
This is maximized when \(\mathbf{x}\) is the eigenvector with the largest eigenvalue, and minimized at the smallest eigenvalue. This is literally what PCA computes.
Hessian matrix: In optimization, the Hessian (second derivatives) determines if you're at a min, max, or saddle point — based on its eigenvalues. All positive = minimum. All negative = maximum. Mixed = saddle.
When a matrix is NOT diagonalizable (geometric multiplicity < algebraic multiplicity), the Jordan form is the "best you can do." It's almost diagonal, with 1's on the superdiagonal in Jordan blocks.
Needed for: solving differential equations with repeated eigenvalues, matrix exponentials of non-diagonalizable matrices.
When there aren't enough regular eigenvectors, generalized eigenvectors fill the gap. They satisfy \((\mathbf{A} - \lambda\mathbf{I})^k\mathbf{v} = \mathbf{0}\) for some \(k > 1\). These form the Jordan chains needed for Jordan Normal Form.
For large matrices (millions × millions), you can't solve the characteristic polynomial. Instead:
Any analytic function can be applied to a matrix via its eigenvalues: \(f(\mathbf{A}) = \mathbf{P}\,f(\mathbf{D})\,\mathbf{P}^{-1}\). This includes \(\sin(\mathbf{A})\), \(\cos(\mathbf{A})\), \(\log(\mathbf{A})\), \(\sqrt{\mathbf{A}}\), etc.
How much do eigenvalues change when the matrix changes slightly? This matters for numerical stability and noisy data.
Weyl's Theorem: For symmetric matrices, if \(\|\mathbf{E}\|\) is small, then eigenvalues of \(\mathbf{A} + \mathbf{E}\) are close to those of \(\mathbf{A}\). This is why PCA is robust to small noise.
Condition number: \(\kappa = |\lambda_{\max}|/|\lambda_{\min}|\). Large condition number = eigenvalues are sensitive to perturbation.
Studies eigenvalue distributions of random matrices. Key results:
Applications: Finance (Markowitz portfolio theory with noisy data), genomics, wireless communications.
Tensors generalize matrices to higher dimensions (3D arrays and beyond). Tensor eigenvalues/decompositions are an active research area.
Applications: Deep learning (weight tensors), signal processing, chemometrics, brain imaging (fMRI).
You need these statistics concepts to understand PCA and covariance-based eigenvalue applications:
In practice, nobody hand-computes eigenvalues for matrices larger than 3×3. You need to know:
np.linalg.eig(A), np.linalg.eigh(A) (symmetric), np.linalg.svd(A)scipy.linalg.eig(A), scipy.sparse.linalg.eigs(A, k=10) (for large sparse matrices — find top \(k\) eigenvalues)sklearn.decomposition.PCA — the high-level PCA interface that handles everything| Week | Phase | Topics | Practice |
|---|---|---|---|
| 1–2 | Prerequisites | Vectors, matrices, matrix multiplication, determinants | Compute 10+ determinants (2×2 and 3×3). Multiply matrices by hand. |
| 3 | Prerequisites | Gaussian elimination, linear independence, rank, null space | Row-reduce 5+ matrices. Find null spaces. |
| 4–5 | Core Eigen | Eigenvalue definition, characteristic equation, finding eigenvectors | Solve 15+ eigenvalue problems by hand (2×2 and 3×3). |
| 6 | Core Eigen | Diagonalization, symmetric matrices, spectral theorem | Diagonalize 5+ matrices. Verify \(\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1}\). |
| 7 | Applications | PCA: covariance, variance, projection | Do PCA by hand on 2D data. Then use scikit-learn on a real dataset. |
| 8 | Applications | SVD, Markov chains, differential equations basics | Find steady state of a Markov chain. Compress an image with SVD in Python. |
| 9–10 | Applications | Spectral clustering, quadratic forms, optimization | Implement spectral clustering on a toy dataset. Analyze Hessian eigenvalues. |
| 11–12 | Advanced | Numerical methods, Jordan form, perturbation theory | Implement the power method from scratch. Read a research paper. |
Don't skip Phase 1. I've seen many students struggle with eigenvalues not because the concept is hard, but because they're shaky on matrix multiplication, determinants, or row reduction. Spend the time on prerequisites — it pays off enormously.
Watch 3Blue1Brown FIRST. Before doing any computation, watch the "Essence of Linear Algebra" series. The visual intuition will make everything click 10× faster.
Do problems by hand AND by code. Hand computation builds understanding. Code handles real-world scale. You need both.
Can you confidently answer "yes" to each of these? If not, revisit that topic.
Eigenvalues & eigenvectors sit at the intersection of linear algebra, calculus, statistics, and probability. To truly master them, you need a solid foundation in all four. But don't be overwhelmed — start with Phase 1, build gradually, and the connections will emerge naturally. Every new concept you learn will make the eigenvalue story richer and more powerful.
"In mathematics, you don't understand things. You just get used to them." — John von Neumann
Complete Learning Roadmap for Eigenvalues & Eigenvectors in Data Science • March 2026
© 2026 Sim Vattanac. All rights reserved.