The Complete Learning Roadmap

Everything You Must Know — Before, During & After Eigenvalues & Eigenvectors

Phase 1
Prerequisites
Phase 2
Core Eigen
Phase 3
Applications
Phase 4
Advanced
Phase 5
Tools & Code

Table of Contents

Phase 1 — Prerequisites (Learn BEFORE Eigenvalues)
1.1 Scalars, Vectors & Matrices 1.2 Matrix Operations 1.3 Determinants 1.4 Systems of Linear Equations & Gaussian Elimination 1.5 Vector Spaces & Subspaces 1.6 Linear Independence, Span & Basis 1.7 Linear Transformations 1.8 Rank, Null Space & Rank-Nullity Theorem
Phase 2 — Core Eigenvalue Theory (The Main Topic)
2.1 Eigenvalue & Eigenvector Definition 2.2 Characteristic Equation & Polynomial 2.3 Eigenspaces 2.4 Diagonalization 2.5 Symmetric Matrices & Spectral Theorem 2.6 Positive Definite Matrices
Phase 3 — Applications (Data Science & Engineering)
3.1 PCA & Dimensionality Reduction 3.2 Singular Value Decomposition (SVD) 3.3 Markov Chains & Stochastic Processes 3.4 Differential Equations & Dynamical Systems 3.5 Graph Theory & Network Analysis 3.6 Optimization & Quadratic Forms
Phase 4 — Advanced Topics
4.1 Jordan Normal Form 4.2 Generalized Eigenvectors 4.3 Numerical Methods for Eigenvalues 4.4 Matrix Functions & Exponentials 4.5 Perturbation Theory 4.6 Random Matrix Theory 4.7 Tensor Decompositions
Phase 5 — Tools, Code & Statistics Co-Requisites
5.1 Statistics Prerequisites for PCA 5.2 Calculus Prerequisites 5.3 Probability Prerequisites 5.4 Python / NumPy / scikit-learn 5.5 Recommended Study Plan & Timeline
Phase 1 — Prerequisites

What You Must Know BEFORE Eigenvalues

These are the building blocks. Without them, eigenvalues won't make sense. Think of it like needing to know the alphabet before reading a novel.

1.1 Scalars, Vectors & Matrices Critical

Why? Because eigenvalues ARE scalars, eigenvectors ARE vectors, and everything operates on matrices.

What you need to know:

Scalar: A single number, like \(5\) or \(-3.2\) or \(\pi\). In eigenvalue problems, \(\lambda\) is a scalar.

Vector: An ordered list of numbers. In 2D: \(\mathbf{v} = \begin{pmatrix} 3 \\ -1 \end{pmatrix}\). Think of it as an arrow from the origin to the point \((3, -1)\). Vectors have both magnitude (length) and direction.

$$\text{Length: } \|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}$$

Matrix: A rectangular grid of numbers. An \(m \times n\) matrix has \(m\) rows and \(n\) columns. Eigenvalues only apply to square matrices (\(n \times n\)).

$$\mathbf{A} = \begin{pmatrix} 2 & 1 & 0 \\ -1 & 3 & 4 \\ 0 & 5 & -2 \end{pmatrix} \quad \text{(3\times 3 matrix)}$$

Key types: Identity matrix \(\mathbf{I}\) (1's on diagonal, 0's elsewhere), Zero matrix \(\mathbf{0}\), Diagonal matrix, Symmetric matrix (\(\mathbf{A} = \mathbf{A}^T\)), Transpose (\(\mathbf{A}^T\) = rows become columns).

1.2 Matrix Operations Critical

Why? You need these to compute \(\mathbf{A}\mathbf{v}\), \(\mathbf{A} - \lambda\mathbf{I}\), and verify eigenvector solutions.

What you need to know:

Matrix Addition: Add corresponding entries. Both matrices must be the same size.

Scalar Multiplication: Multiply every entry by the scalar. When we write \(\lambda\mathbf{v}\), we scale every component of \(\mathbf{v}\) by \(\lambda\).

Matrix-Vector Multiplication: This is the heart of eigenvalue problems. \(\mathbf{A}\mathbf{v}\) produces a new vector:

$$\begin{pmatrix} a & b \\ c & d \end{pmatrix}\begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} ax + by \\ cx + dy \end{pmatrix}$$

Each row of \(\mathbf{A}\) gets "dotted" with \(\mathbf{v}\). The result is a new vector.

Matrix-Matrix Multiplication: \((\mathbf{A}\mathbf{B})_{ij} = \text{row } i \text{ of } \mathbf{A} \cdot \text{col } j \text{ of } \mathbf{B}\). Requires: columns of \(\mathbf{A}\) = rows of \(\mathbf{B}\). Not commutative: \(\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}\) in general!

Dot Product: \(\mathbf{u} \cdot \mathbf{v} = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n\). If \(\mathbf{u} \cdot \mathbf{v} = 0\), the vectors are orthogonal (perpendicular).

Matrix Inverse: \(\mathbf{A}^{-1}\) exists iff \(\det(\mathbf{A}) \neq 0\). Then \(\mathbf{A}\mathbf{A}^{-1} = \mathbf{I}\). For 2×2:

$$\begin{pmatrix} a & b \\ c & d \end{pmatrix}^{-1} = \frac{1}{ad-bc}\begin{pmatrix} d & -b \\ -c & a \end{pmatrix}$$

1.3 Determinants Critical

Why? The characteristic equation IS a determinant: \(\det(\mathbf{A} - \lambda\mathbf{I}) = 0\). You cannot find eigenvalues without this.

What you need to know:

2×2 Determinant:

$$\det\begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad - bc$$

3×3 Determinant: Use cofactor expansion along any row or column. Most commonly along the first row:

$$\det\begin{pmatrix} a & b & c \\ d & e & f \\ g & h & i \end{pmatrix} = a(ei-fh) - b(di-fg) + c(dh-eg)$$

Key properties: \(\det(\mathbf{A}\mathbf{B}) = \det(\mathbf{A}) \cdot \det(\mathbf{B})\). If \(\det(\mathbf{A}) = 0\), then \(\mathbf{A}\) is singular (not invertible, has eigenvalue 0). Swapping two rows flips the sign. Multiplying a row by \(k\) multiplies the determinant by \(k\).

Connection to eigenvalues: \(\det(\mathbf{A}) = \lambda_1 \cdot \lambda_2 \cdots \lambda_n\) (the product of all eigenvalues).

1.4 Systems of Linear Equations & Gaussian Elimination Critical

Why? Finding eigenvectors requires solving \((\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}\), which is a system of linear equations.

What you need to know:

A system like:

$$2x + y = 0, \quad x - 3y = 0$$

can be written as \(\mathbf{A}\mathbf{x} = \mathbf{b}\). When \(\mathbf{b} = \mathbf{0}\), it's called a homogeneous system — and that's exactly what we solve for eigenvectors.

Gaussian Elimination (Row Reduction): Transform the augmented matrix to Row Echelon Form (REF) or Reduced Row Echelon Form (RREF) using three operations: (1) swap rows, (2) multiply a row by a nonzero scalar, (3) add a multiple of one row to another.

Free variables: Variables that don't correspond to a pivot are "free" — you choose their values. In eigenvector problems, there's always at least one free variable (that's why eigenvectors aren't unique — any scalar multiple works).

1.5 Vector Spaces & Subspaces Important

Why? Eigenspaces are subspaces. Understanding the concept deepens your knowledge beyond "just computing."

What you need to know:

A vector space is a set of vectors that is closed under addition and scalar multiplication. \(\mathbb{R}^n\) is the most common example.

A subspace is a subset that is itself a vector space. Examples: any line through the origin in \(\mathbb{R}^2\), any plane through the origin in \(\mathbb{R}^3\).

The set of all eigenvectors for a given eigenvalue \(\lambda\) (plus the zero vector) forms a subspace called the eigenspace \(E_\lambda\).

Four Fundamental Subspaces (Strang's framework): Column space, Null space, Row space, Left null space. The null space of \((\mathbf{A} - \lambda\mathbf{I})\) is the eigenspace for \(\lambda\).

1.6 Linear Independence, Span & Basis Important

Why? You need to determine if eigenvectors are linearly independent (required for diagonalization).

Linear Independence: Vectors \(\{\mathbf{v}_1, \ldots, \mathbf{v}_k\}\) are linearly independent if the only solution to \(c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k = \mathbf{0}\) is \(c_1 = \cdots = c_k = 0\). No vector can be written as a combination of the others.

Key Fact: Eigenvectors corresponding to distinct eigenvalues are always linearly independent. This is why a matrix with \(n\) distinct eigenvalues is always diagonalizable.

Span: The set of all possible linear combinations of a set of vectors.

Basis: A linearly independent set that spans the whole space. A basis for \(\mathbb{R}^n\) has exactly \(n\) vectors. If \(n\) eigenvectors form a basis, the matrix is diagonalizable.

Dimension: The number of vectors in a basis. The dimension of an eigenspace = geometric multiplicity of that eigenvalue.

1.7 Linear Transformations Important

Why? A matrix IS a linear transformation. Eigenvectors are the directions that the transformation merely scales.

A function \(T: \mathbb{R}^n \to \mathbb{R}^m\) is linear if: \(T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v})\) and \(T(c\mathbf{u}) = cT(\mathbf{u})\).

Every linear transformation can be represented by a matrix: \(T(\mathbf{v}) = \mathbf{A}\mathbf{v}\).

Key geometric transformations:

TransformationMatrix Example (2D)Eigenvalues
Scaling by \(k\)\(\begin{pmatrix}k & 0 \\ 0 & k\end{pmatrix}\)\(\lambda = k\) (double)
Stretch x-axis by 2\(\begin{pmatrix}2 & 0 \\ 0 & 1\end{pmatrix}\)\(\lambda = 2, 1\)
Reflection over x-axis\(\begin{pmatrix}1 & 0 \\ 0 & -1\end{pmatrix}\)\(\lambda = 1, -1\)
90° rotation\(\begin{pmatrix}0 & -1 \\ 1 & 0\end{pmatrix}\)\(\lambda = \pm i\) (complex)
Shear\(\begin{pmatrix}1 & k \\ 0 & 1\end{pmatrix}\)\(\lambda = 1\) (double)
Projection onto x-axis\(\begin{pmatrix}1 & 0 \\ 0 & 0\end{pmatrix}\)\(\lambda = 1, 0\)

1.8 Rank, Null Space & Rank-Nullity Theorem Important

Why? The null space of \((\mathbf{A} - \lambda\mathbf{I})\) gives you the eigenspace. The rank tells you the dimension of the eigenspace.

Rank: The number of linearly independent rows (or columns) = number of pivots after row reduction.

Null Space (Kernel): The set of all \(\mathbf{v}\) such that \(\mathbf{A}\mathbf{v} = \mathbf{0}\). The eigenspace for \(\lambda\) IS the null space of \((\mathbf{A} - \lambda\mathbf{I})\).

Nullity: Dimension of the null space.

Rank-Nullity Theorem:

$$\text{rank}(\mathbf{A}) + \text{nullity}(\mathbf{A}) = n$$

This tells you: if the rank of \((\mathbf{A} - \lambda\mathbf{I})\) is \(r\), then the eigenspace has dimension \(n - r\).

Phase 2 — Core Eigenvalue Theory

The Main Topic Itself

2.1 Eigenvalue & Eigenvector Definition Critical

$$\mathbf{A}\mathbf{v} = \lambda\mathbf{v}, \quad \mathbf{v} \neq \mathbf{0}$$

Covered in your main tutorial. This is the foundation of everything.

2.2 Characteristic Equation & Polynomial Critical

$$\det(\mathbf{A} - \lambda\mathbf{I}) = 0$$

The characteristic polynomial has degree \(n\) for an \(n \times n\) matrix. By the Fundamental Theorem of Algebra, it always has exactly \(n\) roots (counting multiplicity, possibly complex).

Cayley-Hamilton Theorem: Every matrix satisfies its own characteristic equation. If \(p(\lambda) = \lambda^2 - 7\lambda + 10\), then \(\mathbf{A}^2 - 7\mathbf{A} + 10\mathbf{I} = \mathbf{0}\).

2.3 Eigenspaces Important

$$E_\lambda = \ker(\mathbf{A} - \lambda\mathbf{I}) = \{\mathbf{v} \in \mathbb{R}^n : \mathbf{A}\mathbf{v} = \lambda\mathbf{v}\}$$

The eigenspace is a subspace. Its dimension (geometric multiplicity) can be 1 to algebraic multiplicity. When geometric < algebraic multiplicity, the matrix is NOT diagonalizable.

2.4 Diagonalization Critical

$$\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1}$$

A matrix is diagonalizable iff it has \(n\) linearly independent eigenvectors. Then \(\mathbf{P}\) has eigenvectors as columns and \(\mathbf{D}\) is diagonal with eigenvalues. This makes computing \(\mathbf{A}^k\) and \(e^{\mathbf{A}t}\) trivial.

When is a matrix diagonalizable? (a) \(n\) distinct eigenvalues → always. (b) Symmetric → always. (c) Geometric multiplicity = algebraic multiplicity for all eigenvalues → yes.

2.5 Symmetric Matrices & Spectral Theorem Critical

If \(\mathbf{A} = \mathbf{A}^T\) (symmetric), then:

  • All eigenvalues are real
  • Eigenvectors for distinct eigenvalues are orthogonal
  • \(\mathbf{A} = \mathbf{Q}\mathbf{D}\mathbf{Q}^T\) where \(\mathbf{Q}\) is orthogonal (\(\mathbf{Q}^{-1} = \mathbf{Q}^T\))
$$\mathbf{A} = \lambda_1 \mathbf{q}_1\mathbf{q}_1^T + \lambda_2 \mathbf{q}_2\mathbf{q}_2^T + \cdots + \lambda_n \mathbf{q}_n\mathbf{q}_n^T$$

This is the theoretical backbone of PCA. Covariance matrices are symmetric, so PCA always works cleanly.

2.6 Positive Definite & Semi-Definite Matrices Important

A symmetric matrix \(\mathbf{A}\) is:

  • Positive definite: All eigenvalues \(> 0\). Equivalently: \(\mathbf{x}^T\mathbf{A}\mathbf{x} > 0\) for all \(\mathbf{x} \neq \mathbf{0}\).
  • Positive semi-definite: All eigenvalues \(\geq 0\). Equivalently: \(\mathbf{x}^T\mathbf{A}\mathbf{x} \geq 0\).
  • Negative definite: All eigenvalues \(< 0\).
  • Indefinite: Has both positive and negative eigenvalues.

Why this matters: Covariance matrices are always positive semi-definite. In optimization, positive definite Hessians mean you're at a minimum. This connects eigenvalues to convexity and machine learning loss landscapes.

Phase 3 — Applications

Where Eigenvalues Meet the Real World

3.1 PCA & Dimensionality Reduction Critical

Knowledge needed: Variance, covariance, covariance matrix, standardization (z-scores), matrix multiplication, eigendecomposition.

The PCA Pipeline:

$$\text{Raw Data} \xrightarrow{\text{center}} \text{Mean-centered} \xrightarrow{\text{compute}} \mathbf{\Sigma} = \frac{1}{n-1}\mathbf{X}^T\mathbf{X} \xrightarrow{\text{eigen}} \lambda_i, \mathbf{v}_i \xrightarrow{\text{project}} \mathbf{X}\mathbf{V}_k$$

Related methods: Factor Analysis, Independent Component Analysis (ICA), t-SNE, UMAP (nonlinear alternatives).

3.2 Singular Value Decomposition (SVD) Critical

$$\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T$$

The generalization of eigendecomposition to any matrix (including non-square). Left singular vectors = eigenvectors of \(\mathbf{A}\mathbf{A}^T\). Right singular vectors = eigenvectors of \(\mathbf{A}^T\mathbf{A}\). Singular values = square roots of eigenvalues of \(\mathbf{A}^T\mathbf{A}\).

Applications: Image compression, recommendation systems (Netflix), NLP (LSA), pseudoinverse, low-rank approximation (Eckart-Young theorem).

3.3 Markov Chains & Stochastic Processes Important

Knowledge needed: Probability, transition matrices (columns sum to 1), steady-state concept.

The steady state \(\boldsymbol{\pi}\) satisfies \(\mathbf{P}\boldsymbol{\pi} = \boldsymbol{\pi}\), which is an eigenvector equation with \(\lambda = 1\). The Perron-Frobenius theorem guarantees that any positive stochastic matrix has a unique steady state.

Applications: PageRank, weather modeling, population genetics, queueing theory, financial modeling.

3.4 Differential Equations & Dynamical Systems Important

Knowledge needed: Calculus (derivatives), exponential functions, systems of ODEs.

$$\dot{\mathbf{x}} = \mathbf{A}\mathbf{x} \implies \mathbf{x}(t) = \sum_i c_i e^{\lambda_i t}\mathbf{v}_i$$

Stability classification: All \(\text{Re}(\lambda_i) < 0\) → stable node/spiral. Mixed signs → saddle point. All positive → unstable. Imaginary parts → oscillation. This is the foundation of control theory, ecology models, and circuit analysis.

3.5 Graph Theory & Network Analysis Important

Knowledge needed: Graph basics (nodes, edges, adjacency matrix), Laplacian matrix.

Key matrices:

  • Adjacency matrix \(\mathbf{A}\): Eigenvalues relate to graph properties (bipartiteness, expansion, diameter)
  • Laplacian \(\mathbf{L} = \mathbf{D} - \mathbf{A}\): Number of zero eigenvalues = number of connected components. Second-smallest eigenvalue (Fiedler value) measures connectivity.

Applications: Spectral clustering, community detection, graph partitioning, network robustness analysis.

3.6 Optimization & Quadratic Forms Important

A quadratic form \(f(\mathbf{x}) = \mathbf{x}^T\mathbf{A}\mathbf{x}\) appears everywhere in optimization (least squares, SVM, neural networks).

Rayleigh Quotient:

$$R(\mathbf{x}) = \frac{\mathbf{x}^T\mathbf{A}\mathbf{x}}{\mathbf{x}^T\mathbf{x}}$$

This is maximized when \(\mathbf{x}\) is the eigenvector with the largest eigenvalue, and minimized at the smallest eigenvalue. This is literally what PCA computes.

Hessian matrix: In optimization, the Hessian (second derivatives) determines if you're at a min, max, or saddle point — based on its eigenvalues. All positive = minimum. All negative = maximum. Mixed = saddle.

Phase 4 — Advanced Topics

Graduate-Level & Research Topics

4.1 Jordan Normal Form Helpful

When a matrix is NOT diagonalizable (geometric multiplicity < algebraic multiplicity), the Jordan form is the "best you can do." It's almost diagonal, with 1's on the superdiagonal in Jordan blocks.

$$\mathbf{J} = \begin{pmatrix} \lambda & 1 & 0 \\ 0 & \lambda & 1 \\ 0 & 0 & \lambda \end{pmatrix}$$

Needed for: solving differential equations with repeated eigenvalues, matrix exponentials of non-diagonalizable matrices.

4.2 Generalized Eigenvectors Helpful

When there aren't enough regular eigenvectors, generalized eigenvectors fill the gap. They satisfy \((\mathbf{A} - \lambda\mathbf{I})^k\mathbf{v} = \mathbf{0}\) for some \(k > 1\). These form the Jordan chains needed for Jordan Normal Form.

4.3 Numerical Methods for Eigenvalues Important

For large matrices (millions × millions), you can't solve the characteristic polynomial. Instead:

  • Power Method: Iteratively multiply by \(\mathbf{A}\) to find the dominant eigenvalue. Converges if there's a clear largest eigenvalue. \(\mathbf{x}_{k+1} = \mathbf{A}\mathbf{x}_k / \|\mathbf{A}\mathbf{x}_k\|\)
  • Inverse Iteration: Apply power method to \(\mathbf{A}^{-1}\) to find the smallest eigenvalue.
  • QR Algorithm: The workhorse algorithm. Iteratively factor \(\mathbf{A} = \mathbf{Q}\mathbf{R}\), then form \(\mathbf{R}\mathbf{Q}\). Converges to upper triangular form with eigenvalues on the diagonal. This is what NumPy uses internally.
  • Lanczos / Arnoldi Methods: For sparse matrices. Find a few eigenvalues (e.g., the top 10) without computing all of them. Essential for large-scale PCA and spectral clustering.

4.4 Matrix Functions & Exponentials Helpful

$$e^{\mathbf{A}t} = \mathbf{P}\begin{pmatrix} e^{\lambda_1 t} & & \\ & e^{\lambda_2 t} & \\ & & \ddots \end{pmatrix}\mathbf{P}^{-1}$$

Any analytic function can be applied to a matrix via its eigenvalues: \(f(\mathbf{A}) = \mathbf{P}\,f(\mathbf{D})\,\mathbf{P}^{-1}\). This includes \(\sin(\mathbf{A})\), \(\cos(\mathbf{A})\), \(\log(\mathbf{A})\), \(\sqrt{\mathbf{A}}\), etc.

4.5 Perturbation Theory Optional

How much do eigenvalues change when the matrix changes slightly? This matters for numerical stability and noisy data.

Weyl's Theorem: For symmetric matrices, if \(\|\mathbf{E}\|\) is small, then eigenvalues of \(\mathbf{A} + \mathbf{E}\) are close to those of \(\mathbf{A}\). This is why PCA is robust to small noise.

Condition number: \(\kappa = |\lambda_{\max}|/|\lambda_{\min}|\). Large condition number = eigenvalues are sensitive to perturbation.

4.6 Random Matrix Theory Optional

Studies eigenvalue distributions of random matrices. Key results:

  • Marchenko-Pastur Law: Eigenvalue distribution of sample covariance matrices when \(n, p \to \infty\). Used to separate "real" eigenvalues from noise in PCA.
  • Tracy-Widom Distribution: Distribution of the largest eigenvalue of random matrices.
  • Wigner's Semicircle Law: Eigenvalue distribution of symmetric random matrices.

Applications: Finance (Markowitz portfolio theory with noisy data), genomics, wireless communications.

4.7 Tensor Decompositions Optional

Tensors generalize matrices to higher dimensions (3D arrays and beyond). Tensor eigenvalues/decompositions are an active research area.

  • CP Decomposition: Generalizes eigendecomposition to tensors
  • Tucker Decomposition: Higher-order SVD

Applications: Deep learning (weight tensors), signal processing, chemometrics, brain imaging (fMRI).

Phase 5 — Tools & Co-Requisites

Supporting Knowledge & Practical Skills

5.1 Statistics Prerequisites (for PCA) Critical

You need these statistics concepts to understand PCA and covariance-based eigenvalue applications:

  • Mean: \(\bar{x} = \frac{1}{n}\sum x_i\)
  • Variance: \(\sigma^2 = \frac{1}{n-1}\sum(x_i - \bar{x})^2\) — measures spread of one variable
  • Standard Deviation: \(\sigma = \sqrt{\text{variance}}\)
  • Covariance: \(\text{cov}(X,Y) = \frac{1}{n-1}\sum(x_i - \bar{x})(y_i - \bar{y})\) — measures how two variables move together
  • Correlation: \(r = \text{cov}(X,Y) / (\sigma_X \sigma_Y)\) — standardized covariance, between -1 and 1
  • Covariance Matrix: \(\mathbf{\Sigma}_{ij} = \text{cov}(X_i, X_j)\) — symmetric, positive semi-definite
  • Standardization (z-scores): \(z = (x - \mu)/\sigma\) — needed before PCA when features have different scales

5.2 Calculus Prerequisites (for Differential Equations & Optimization) Important

  • Derivatives: Basic differentiation rules
  • Exponential function: \(e^x\), \(\frac{d}{dt}e^{at} = ae^{at}\) — appears in solutions of \(\dot{\mathbf{x}} = \mathbf{A}\mathbf{x}\)
  • Partial derivatives: For multivariable functions, needed for gradients and Hessians
  • Gradient: \(\nabla f = \begin{pmatrix}\partial f/\partial x_1 \\ \vdots \\ \partial f/\partial x_n\end{pmatrix}\)
  • Hessian Matrix: \(\mathbf{H}_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}\) — its eigenvalues determine convexity
  • Taylor expansion: Connects Hessian to quadratic forms

5.3 Probability Prerequisites (for Markov Chains & Statistics) Important

  • Probability basics: Events, conditional probability, Bayes' theorem
  • Random variables: Expected value \(E[X]\), variance \(\text{Var}(X)\)
  • Joint distributions: How multiple variables relate — connects to covariance
  • Transition probabilities: \(P(\text{state } j \to \text{state } i)\) — the entries of Markov transition matrices
  • Normal distribution: PCA works best when data is approximately Gaussian. The covariance matrix fully characterizes a multivariate normal.

5.4 Python / NumPy / scikit-learn Critical

In practice, nobody hand-computes eigenvalues for matrices larger than 3×3. You need to know:

  • NumPy: np.linalg.eig(A), np.linalg.eigh(A) (symmetric), np.linalg.svd(A)
  • SciPy: scipy.linalg.eig(A), scipy.sparse.linalg.eigs(A, k=10) (for large sparse matrices — find top \(k\) eigenvalues)
  • scikit-learn: sklearn.decomposition.PCA — the high-level PCA interface that handles everything
  • Pandas: Data manipulation before feeding into PCA
  • Matplotlib: Visualizing eigenvectors, scree plots, PCA projections

5.5 Recommended Study Plan & Timeline

Week Phase Topics Practice
1–2 Prerequisites Vectors, matrices, matrix multiplication, determinants Compute 10+ determinants (2×2 and 3×3). Multiply matrices by hand.
3 Prerequisites Gaussian elimination, linear independence, rank, null space Row-reduce 5+ matrices. Find null spaces.
4–5 Core Eigen Eigenvalue definition, characteristic equation, finding eigenvectors Solve 15+ eigenvalue problems by hand (2×2 and 3×3).
6 Core Eigen Diagonalization, symmetric matrices, spectral theorem Diagonalize 5+ matrices. Verify \(\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1}\).
7 Applications PCA: covariance, variance, projection Do PCA by hand on 2D data. Then use scikit-learn on a real dataset.
8 Applications SVD, Markov chains, differential equations basics Find steady state of a Markov chain. Compress an image with SVD in Python.
9–10 Applications Spectral clustering, quadratic forms, optimization Implement spectral clustering on a toy dataset. Analyze Hessian eigenvalues.
11–12 Advanced Numerical methods, Jordan form, perturbation theory Implement the power method from scratch. Read a research paper.
Professor's Advice

Don't skip Phase 1. I've seen many students struggle with eigenvalues not because the concept is hard, but because they're shaky on matrix multiplication, determinants, or row reduction. Spend the time on prerequisites — it pays off enormously.

Watch 3Blue1Brown FIRST. Before doing any computation, watch the "Essence of Linear Algebra" series. The visual intuition will make everything click 10× faster.

Do problems by hand AND by code. Hand computation builds understanding. Code handles real-world scale. You need both.

Self-Assessment Checklist

Can you confidently answer "yes" to each of these? If not, revisit that topic.

Phase 1 — Prerequisites

Phase 2 — Core

Phase 3 — Applications


The Complete Picture

Eigenvalues & eigenvectors sit at the intersection of linear algebra, calculus, statistics, and probability. To truly master them, you need a solid foundation in all four. But don't be overwhelmed — start with Phase 1, build gradually, and the connections will emerge naturally. Every new concept you learn will make the eigenvalue story richer and more powerful.



"In mathematics, you don't understand things. You just get used to them." — John von Neumann

Complete Learning Roadmap for Eigenvalues & Eigenvectors in Data Science • March 2026

© 2026 Sim Vattanac. All rights reserved.