Introduction
The spectral graph theory studies the properties of graphs via the eigenvalues and eigenvectors of their associated graph matrices: the adjacency matrix and the graph Laplacian and its variants. The Laplacian allows a natural link between discrete representations, such as graphs, and continuous representations, such as vector spaces and manifolds. The most important application of the Laplacian is spectral clustering that corresponds to a computationally tractable solution to the graph partitionning problem. Another application is spectral matching that solves for graph matching.
Basic notations
We consider simple graphs (no multiple edges or loops), $\mathcal{G}=(\mathcal{V}, \mathcal{E})$ :
$\mathcal{V}(\mathcal{G})=\left\{v_{1}, \ldots, v_{n}\right\}$ is called the vertex set with $n=|\mathcal{V}|$;
$\mathcal{E}(\mathcal{G})=\left\{e_{i j}\right\}$ is called the edge set with $m=|\mathcal{E}|$;
An edge $e_{i j}$ connects vertices $v_{i}$ and $v_{j}$ if they are adjacent or neighbors. One possible notation for adjacency is $v_{i} \sim v_{j}$;
The number of neighbors of a node $v$ is called the degree of $v$ and is denoted by $d(v), d\left(v_{i}\right)=\sum_{v_{i} \sim v_{j}} e_{i j}$. If all the nodes of a graph have the same degree, the graph is regular; The nodes of an Eulerian graph have even degree.
A graph is complete if there is an edge between every pair of vertices.
Subgraph of a graph
$\mathcal{H}$ is a subgraph of $\mathcal{G}$ if $\mathcal{V}(\mathcal{H}) \subseteq \mathcal{V}(\mathcal{G})$ and $\mathcal{E}(\mathcal{H}) \subseteq \mathcal{E}(\mathcal{G})$;
a subgraph $\mathcal{H}$ is an induced subgraph of $\mathcal{G}$ if two vertices of $\mathcal{V}(\mathcal{H})$ are adjacent if and only if they are adjacent in $\mathcal{G}$.
A clique is a complete subgraph of a graph.
A path of $k$ vertices is a sequence of $k$ distinct vertices such that consecutive vertices are adjacent.
A cycle is a connected subgraph where every vertex has exactly two neighbors.
A graph containing no cycles is a forest. A connected forest is a tree.
A k-partite graph
- A graph is called k-partite if its set of vertices admits a partition into $k$ classes such that the vertices of the same class are not adjacent.
- An example of a bipartite graph.
The adjacency matrix of a graph
- For a graph with $n$ vertices, the entries of the $n \times n$ adjacency matrix are defined by:
$$ \mathbf{A}:= \begin{cases}A_{i j}=1 & \text { if there is an edge } e_{i j} \\ A_{i j}=0 & \text { if there is no edge } \\ A_{i i}=0 & \end{cases} $$
$$ \begin{aligned} & \mathbf{A}=\left[\begin{array}{llll}0 & 1 & 1 & 0 \\1 & 0 & 1 & 1 \\1 & 1 & 0 & 0 \\0 & 1 & 0 & 0\end{array}\right] \end{aligned} $$
Eigenvalues and eigenvectors
A is a real-symmetric matrix: it has $n$ real eigenvalues and its $n$ real eigenvectors form an orthonormal basis.
Let $\left\{\lambda_{1}, \ldots, \lambda_{i}, \ldots, \lambda_{r}\right\}$ be the set of distinct eigenvalues.
The eigenspace $S_{i}$ contains the eigenvectors associated with $\lambda_{i}$ :
$$ S_{i}=\left\{\boldsymbol{x} \in \mathbb{R}^{n} \mid \mathbf{A} \boldsymbol{x}=\lambda_{i} \boldsymbol{x}\right\} $$
For real-symmetric matrices, the algebraic multiplicity is equal to the geometric multiplicity, for all the eigenvalues.
The dimension of $S_{i}$ (geometric multiplicity) is equal to the multiplicity of $\lambda_{i}$.
If $\lambda_{i} \neq \lambda_{j}$ then $S_{i}$ and $S_{j}$ are mutually orthogonal.
Real-valued functions on graphs
We consider real-valued functions on the set of the graph’s vertices, $\boldsymbol{f}: \mathcal{V} \longrightarrow \mathbb{R}$. Such a function assigns a real number to each graph node.
$\boldsymbol{f}$ is a vector indexed by the graph’s vertices, hence $\boldsymbol{f} \in \mathbb{R}^{n}$.
Notation: $\boldsymbol{f}=\left(f\left(v_{1}\right), \ldots, f\left(v_{n}\right)\right)=(f(1), \ldots, f(n))$.
The eigenvectors of the adjacency matrix, $\mathbf{A} \boldsymbol{x}=\lambda \boldsymbol{x}$, can be viewed as eigenfunctions.
Matrix A as an operator and quadratic form
- The adjacency matrix can be viewed as an operator
$$ \boldsymbol{g}=\mathbf{A} \boldsymbol{f} ; g(i)=\sum_{i \sim j} f(j) $$
- It can also be viewed as a quadratic form:
$$ \boldsymbol{f}^{\top} \mathbf{A} \boldsymbol{f}=\sum_{e_{i j}} f(i) f(j) $$
The incidence matrix of a graph
Let each edge in the graph have an arbitrary but fixed orientation;
The incidence matrix of a graph is a $|\mathcal{E}| \times|\mathcal{V}|(m \times n)$ matrix defined as follows:
$$ \nabla:= \begin{cases}\nabla_{e v}=-1 & \text { if } v \text { is the initial vertex of edge } e \\ \nabla_{e v}=1 & \text { if } v \text { is the terminal vertex of edge } e \\ \nabla_{e v}=0 & \text { if } v \text { is not in } e\end{cases} $$
$$ \begin{aligned} & \nabla=\left[\begin{array}{cccc}-1 & 1 & 0 & 0 \\1 & 0 & -1 & 0 \\0 & -1 & 1 & 0 \\0 & -1 & 0 & +1\end{array}\right] \end{aligned} $$
The incidence matrix: A discrete differential operator
The mapping $\boldsymbol{f} \longrightarrow \nabla \boldsymbol{f}$ is known as the co-boundary mapping of the graph.
$(\nabla \boldsymbol{f})\left(e_{i j}\right)=f\left(v_{j}\right)-f\left(v_{i}\right)$
$$ \left(\begin{array}{c} f(2)-f(1) \\ f(1)-f(3) \\ f(3)-f(2) \\ f(4)-f(2) \end{array}\right)=\left[\begin{array}{cccc} -1 & 1 & 0 & 0 \\ 1 & 0 & -1 & 0 \\ 0 & -1 & 1 & 0 \\ 0 & -1 & 0 & +1 \end{array}\right]\left(\begin{array}{c} f(1) \\ f(2) \\ f(3) \\ f(4) \end{array}\right) $$
The Laplacian matrix of a graph
$\mathbf{L}=\nabla^{\top} \nabla$
$(\mathbf{L} \boldsymbol{f})\left(v_{i}\right)=\sum_{v_{j} \sim v_{i}}\left(f\left(v_{i}\right)-f\left(v_{j}\right)\right)$
Connection between the Laplacian and the adjacency matrices:
$$ \mathbf{L}=\mathbf{D}-\mathbf{A} $$
- The degree matrix: $\mathbf{D}:=D_{i i}=d\left(v_{i}\right)$.
$$ \mathbf{L}=\left[\begin{array}{cccc} 2 & -1 & -1 & 0 \\ -1 & 3 & -1 & -1 \\ -1 & -1 & 2 & 0 \\ 0 & -1 & 0 & 1 \end{array}\right] $$
The Laplacian matrix of an undirected weighted graph
We consider undirected weighted graphs: Each edge $e_{i j}$ is weighted by $w_{i j}>0$.
The Laplacian as an operator:
$$ (\mathbf{L} \boldsymbol{f})\left(v_{i}\right)=\sum_{v_{j} \sim v_{i}} w_{i j}\left(f\left(v_{i}\right)-f\left(v_{j}\right)\right) $$
- As a quadratic form:
$$ \boldsymbol{f}^{\top} \mathbf{L} \boldsymbol{f}=\frac{1}{2} \sum_{e_{i j}} w_{i j}\left(f\left(v_{i}\right)-f\left(v_{j}\right)\right)^{2} $$
L is symmetric and positive semi-definite.
L has $n$ non-negative, real-valued eigenvalues: $0=\lambda_{1} \leq \lambda_{2} \leq \ldots \leq \lambda_{n} .$
The Laplacian of a 3D discrete surface (mesh)
A graph vertex $v_{i}$ is associated with a 3D point $\boldsymbol{v}_{i}$.
The weight of an edge $e_{i j}$ is defined by the Gaussian kernel:
$$ w_{i j}=\exp \left(-\left|\boldsymbol{v}_{i}-\boldsymbol{v}_{j}\right|^{2} / \sigma^{2}\right) $$
$0 \leq w_{\min } \leq w_{i j} \leq w_{\max } \leq 1$
Hence, the geometric structure of the mesh is encoded in the weights.
Other weighting functions were proposed in the literature.
The Laplacian of a cloud of points
3-nearest neighbor graph
$\varepsilon$-radius graph
KNN may guarantee that the graph is connected (depends on the implementation)
$\varepsilon$-radius does not guarantee that the graph has one connected component
The Laplacian of a graph with one connected component
$Lu =\lambda \boldsymbol{u}$.
$\mathbf{L} \mathbf{1}_{n}=\mathbf{0}, \lambda_{1}=0$ is the smallest eigenvalue.
The one vector: $\mathbf{1}_{n}=(1 \ldots 1)^{\top}$.
$0=\boldsymbol{u}^{\top} \mathbf{L} \boldsymbol{u}=\sum_{i, j=1}^{n} w_{i j}(u(i)-u(j))^{2}$.
If any two vertices are connected by a path, then $\boldsymbol{u}=(u(1), \ldots, u(n))$ needs to be constant at all vertices such that the quadratic form vanishes. Therefore, a graph with one connected component has the constant vector $\boldsymbol{u}_{1}=\mathbf{1}_{n}$ as the only eigenvector with eigenvalue 0 .
A graph with $k>1$ connected components
- Each connected component has an associated Laplacian. Therefore, we can write matrix $\mathbf{L}$ as a block diagonal matrix:
$$ \mathbf{L}=\left[\begin{array}{lll} \mathbf{L}_{1} & & \\ & \ddots & \\ & & \mathbf{L}_{k} \end{array}\right] $$
The spectrum of $\mathbf{L}$ is given by the union of the spectra of $\mathbf{L}_{i}$.
Each block corresponds to a connected component, hence each matrix $\mathbf{L}_{i}$ has an eigenvalue 0 with multiplicity 1 .
The spectrum of $\mathbf{L}$ is given by the union of the spectra of $\mathbf{L}_{i}$.
The eigenvalue $\lambda_{1}=0$ has multiplicity $k$.
The eigenspace of $\lambda_{1}=0$ with multiplicity $k$
- The eigenspace corresponding to $\lambda_{1}=\ldots=\lambda_{k}=0$ is spanned by the $k$ mutually orthogonal vectors:
$$ \begin{aligned} \boldsymbol{u}_{1} &=\mathbf{1}_{L_{1}} \\ & \cdots \\ \boldsymbol{u}_{k} &=\mathbf{1}_{L_{k}} \end{aligned} $$
with $\mathbf{1}_{L_{i}}=(0000111110000)^{\top} \in \mathbb{R}^{n}$
These vectors are the indicator vectors of the graph’s connected components.
Notice that $\mathbf{1}_{L_{1}}+\ldots+\mathbf{1}_{L_{k}}=\mathbf{1}_{n}$
The Fiedler vector of the graph Laplacian
The first non-null eigenvalue $\lambda_{k+1}$ is called the Fiedler value.
The corresponding eigenvector $\boldsymbol{u}_{k+1}$ is called the Fiedler vector.
The multiplicity of the Fiedler eigenvalue is always equal to $1 .$
The Fiedler value is the algebraic connectivity of a graph, the further from 0 , the more connected.
The Fidler vector has been extensively used for spectral bi-partioning
Theoretical results are summarized in Spielman & Teng 2007: http://cs-www.cs.yale.edu/homes/spielman/
Eigenvectors of the Laplacian of connected graphs
$\boldsymbol{u}_{1}=\mathbf{1}_{n}, \mathbf{L} \mathbf{1}_{n}=\mathbf{0}$.
$\boldsymbol{u}_{2}$ is the the Fiedler vector with multiplicity 1 .
The eigenvectors form an orthonormal basis: $\boldsymbol{u}_{i}^{\top} \boldsymbol{u}_{j}=\delta_{i j}$.
For any eigenvector $\boldsymbol{u}_{i}=\left(\boldsymbol{u}_{i}\left(v_{1}\right) \ldots \boldsymbol{u}_{i}\left(v_{n}\right)\right)^{\top}, 2 \leq i \leq n$ :
$$ \boldsymbol{u}_{i}^{\top} \mathbf{1}_{n}=0 $$
- Hence the components of $\boldsymbol{u}_{i}, 2 \leq i \leq n$ satisfy:
$$ \sum_{j=1}^{n} \boldsymbol{u}_{i}\left(v_{j}\right)=0 $$
- Each component is bounded by:
$$ -1<\boldsymbol{u}_{i}\left(v_{j}\right)<1 $$
Laplacian embedding: Mapping a graph on a line
- Map a weighted graph onto a line such that connected nodes stay as close as possible, i.e., minimize $\sum_{i, j=1}^{n} w_{i j}\left(f\left(v_{i}\right)-f\left(v_{j}\right)\right)^{2}$, or:
$$ \arg \min _{\boldsymbol{f}} \boldsymbol{f}^{\top} \mathbf{L} \boldsymbol{f} \text { with: } \boldsymbol{f}^{\top} \boldsymbol{f}=1 \text { and } \boldsymbol{f}^{\top} \mathbf{1}=0 $$
The solution is the eigenvector associated with the smallest nonzero eigenvalue of the eigenvalue problem: $\mathbf{L} \boldsymbol{f}=\lambda \boldsymbol{f}$, namely the Fiedler vector $\boldsymbol{u}_{2}$.
For more details on this minimization see Golub & Van Loan Matrix Computations, chapter 8 (The symmetric eigenvalue problem).
Example of mapping a graph on the Fiedler vector:
Laplacian embedding
Embed the graph in a $k$-dimensional Euclidean space. The embedding is given by the $n \times k$ matrix $\mathbf{F}=\left[\boldsymbol{f}_{1} \boldsymbol{f}_{2} \ldots \boldsymbol{f}_{k}\right]$ where the $i$-th row of this matrix $-\boldsymbol{f}^{(i)}-$ corresponds to the Euclidean coordinates of the $i$-th graph node $v_{i}$.
We need to minimize:
$$ \arg \min_{\boldsymbol{f}_{1} \ldots} \sum_{k}^{n} \sum_{i, j=1}^{n} w_{i j}\left|\left|\boldsymbol{f}^{(i)}-\boldsymbol{f}^{(j)}\right|\right|^{2} \text { with: } \mathbf{F}^{\top} \mathbf{F}=\mathbf{I} $$
- The solution is provided by the matrix of eigenvectors corresponding to the $k$ lowest nonzero eigenvalues of the eigenvalue problem $\mathbf{L} \boldsymbol{f}=\lambda \boldsymbol{f}$.
Spectral embedding using the unnormalized Laplacian
Compute the eigendecomposition $\mathbf{L}=\mathbf{D}-\mathbf{A}$.
Select the $k$ smallest non-null eigenvalues $\lambda_{2} \leq \ldots \leq \lambda_{k+1}$
$\lambda_{k+2}-\lambda_{k+1}=$ eigengap.
We obtain the $n \times k$ matrix $\mathbf{U}=\left[\boldsymbol{u}_{2} \ldots \boldsymbol{u}_{k+1}\right]$ :
$$ \mathbf{U}=\left[\begin{array}{ccc} \boldsymbol{u}_{2}\left(v_{1}\right) & \ldots & \boldsymbol{u}_{k+1}\left(v_{1}\right) \\ \vdots & & \vdots \\ \boldsymbol{u}_{2}\left(v_{n}\right) & \ldots & \boldsymbol{u}_{k+1}\left(v_{n}\right) \end{array}\right] $$
$\boldsymbol{u}_{i}^{\top} \boldsymbol{u}_{j}=\delta_{i j}$ (orthonormal vectors), hence $\mathbf{U}^{\top} \mathbf{U}=\mathbf{I}_{k}$.
Column $i(2 \leq i \leq k+1)$ of this matrix is a mapping on the eigenvector $\boldsymbol{u}_{i}$.
Euclidean L-embedding of the graph’s vertices
- (Euclidean) L-embedding of a graph:
$$ \mathbf{X}=\boldsymbol{\Lambda}_{k}^{-\frac{1}{2}} \mathbf{U}^{\top}=\left[\begin{array}{llll} \boldsymbol{x}_{1} & \ldots & \boldsymbol{x}_{j} \ldots & \boldsymbol{x}_{n} \end{array}\right] $$
The coordinates of a vertex $v_{j}$ are:
$$ \boldsymbol{x}_{j}=\left(\begin{array}{c} \frac{\boldsymbol{u}_{2}\left(v_{j}\right)}{\sqrt{\lambda_{2}}} \\ \vdots \\ \frac{\boldsymbol{u}_{k+1}\left(v_{j}\right)}{\sqrt{\lambda_{k+1}}} \end{array}\right) $$
Justification for choosing the L-embedding
Both
the commute-time distance (CTD) and
the principal-component analysis of a graph (graph PCA)
are two important concepts; They allow to reason “statistically” on a graph. They are both associated with the unnormalized Laplacian matrix.
The commute-time distance
The CTD is a well known quantity in Markov chains;
It is the average number of (weighted) edges that it takes, starting at vertex $v_{i}$, to randomly reach vertex $v_{j}$ for the first time and go back;
The CTD decreases as the number of connections between the two nodes increases;
It captures the connectivity structure of a small graph volume rather than a single path between the two vertices - such as the shortest-path geodesic distance.
The CTD can be computed in closed form:
$$ \operatorname{CTD}^{2}\left(v_{i}, v_{j}\right)=\operatorname{vol}(\mathcal{G})\left|\left|\boldsymbol{x}_{i}-\boldsymbol{x}_{j}\right|\right|^{2} $$
The graph PCA
- The mean (remember that $\sum_{j=1}^{n} \boldsymbol{u}_{i}\left(v_{j}\right)=0$ ):
$$ \overline{\boldsymbol{x}}=\frac{1}{n} \sum_{i=1}^{n} \boldsymbol{x}_{j}=\boldsymbol{\Lambda}_{k}^{-\frac{1}{2}}\left(\begin{array}{c} \sum_{j=1}^{n} \boldsymbol{u}_{2}\left(v_{j}\right) \\ \vdots \\ \sum_{j=1}^{n} \boldsymbol{u}_{k+1}\left(v_{j}\right) \end{array}\right)=\left(\begin{array}{c} 0 \\ \vdots \\ 0 \end{array}\right) $$
- The covariance matrix:
$$ \mathbf{S}=\frac{1}{n} \sum_{j=1}^{n} \boldsymbol{x}_{j} \boldsymbol{x}_{j}^{\top}=\frac{1}{n} \mathbf{X} \mathbf{X}^{\top}=\frac{1}{n} \boldsymbol{\Lambda}_{k}^{-\frac{1}{2}} \mathbf{U}^{\top} \mathbf{U} \boldsymbol{\Lambda}_{k}^{-\frac{1}{2}}=\frac{1}{n} \boldsymbol{\Lambda}_{k}^{-1} $$
- The vectors $\boldsymbol{u}_{2}, \ldots, \boldsymbol{u}_{k+1}$ are the directions of maximum variance of the graph embedding, with $\lambda_{2}^{-1} \geq \ldots \geq \lambda_{k+1}^{-1}$.
Other Laplacian matrices
- The normalized graph Laplacian (symmetric and semi-definite positive):
$$ \mathbf{L}_{n}=\mathbf{D}^{-\frac{1}{2}} \mathbf{L} \mathbf{D}^{-\frac{1}{2}}=\mathbf{I}-\mathbf{D}^{-\frac{1}{2}} \mathbf{A} \mathbf{D}^{-\frac{1}{2}} $$
- The transition matrix (allows an analogy with Markov chains):
$$ \mathbf{L}_{t}=\mathbf{D}^{-1} \mathbf{A} $$
- The random-walk graph Laplacian:
$$ \mathbf{L}_{r}=\mathbf{D}^{-1} \mathbf{L}=\mathbf{I}-\mathbf{L}_{t} $$
- These matrices are similar:
$$ \mathbf{L}_{r}=\mathbf{D}^{-\frac{1}{2}} \mathbf{D}^{-\frac{1}{2}} \mathbf{L} \mathbf{D}^{-\frac{1}{2}} \mathbf{D}^{\frac{1}{2}}=\mathbf{D}^{-\frac{1}{2}} \mathbf{L}_{n} \mathbf{D}^{\frac{1}{2}} $$
Eigenvalues and eigenvectors of $\mathrm{L}_{n}$ and $\mathrm{L}_{r}$
- $\mathbf{L}_{r} \boldsymbol{w}=\lambda \boldsymbol{w} \Longleftrightarrow \mathbf{L} \boldsymbol{w}=\lambda \mathbf{D} \boldsymbol{w}$, hence:
$$ \mathbf{L}_{r}: \quad \lambda_{1}=0 ; \quad \boldsymbol{w}_{1}=\mathbf{1} $$
- $\mathbf{L}_{n} \boldsymbol{v}=\lambda \boldsymbol{v}$. By virtue of the similarity transformation between the two matrices:
$$ \mathbf{L}_{n}: \quad \lambda_{1}=0 \quad \boldsymbol{v}_{1}=\mathbf{D}^{\frac{1}{2}} \mathbf{1} $$
- More generally, the two matrices have the same eigenvalues:
$$ 0=\lambda_{1} \leq \ldots \leq \lambda_{i} \ldots \leq \lambda_{n} $$
- Their eigenvectors are related by:
$$ \boldsymbol{v}_{i}=\mathbf{D}^{\frac{1}{2}} \boldsymbol{w}_{i}, \forall i=1 \ldots n $$
Spectral embedding using the random-walk Laplacian $\mathbf{L}_{r}$
- The $n \times k$ matrix contains the first $k$ eigenvectors of $\mathbf{L}_{r}$ :
$$ \mathbf{W}=\left[\begin{array}{lll} \boldsymbol{w}_{2} & \ldots & \boldsymbol{w}_{k+1} \end{array}\right] $$
- It is straightforward to obtain the following expressions, where $\boldsymbol{d}$ and $\mathbf{D}$ are the degree-vector and the degree-matrix:
$$ \begin{gathered} \boldsymbol{w}_{i}^{\top} \boldsymbol{d}=0, \forall i, 2 \leq i \leq n \\ \mathbf{W}^{\top} \mathbf{D W}=\mathbf{I}_{k} \end{gathered} $$
- The isometric embedding using the random-walk Laplacian:
$$ \mathbf{Y}=\mathbf{W}^{\top}=\left[\begin{array}{lll} \boldsymbol{y}_{1} & \ldots & \boldsymbol{y}_{n} \end{array}\right] $$
The normalized additive Laplacian
- Some authors use the following matrix:
$$ \mathbf{L}_{a}=\frac{1}{d_{\max }}\left(\mathbf{A}+d_{\max } \mathbf{I}-\mathbf{D}\right) $$
- This matrix is closely related to L:
$$ \mathbf{L}_{a}=\frac{1}{d_{\max }}\left(d_{\max } \mathbf{I}-\mathbf{L}\right) $$
- and we have:
$$ \mathbf{L}_{a} \boldsymbol{u}=\mu \boldsymbol{u} \Longleftrightarrow \mathbf{L} \boldsymbol{u}=\lambda \boldsymbol{u}, \mu=1-\frac{\lambda}{d_{\max }} $$
The graph partitioning problem
- The graph-cut problem: Partition the graph such that:
(1) Edges between groups have very low weight, and
(2) Edges within a group have high weight.
$\operatorname{cut}\left(A_{1}, \ldots, A_{k}\right):=\frac{1}{2} \sum_{i=1}^{k} W\left(A_{i}, \bar{A}_{i}\right)$ with $W(A, B)=\sum_{i \in A, j \in B} w_{i j}$
- Ratio cut:
$$ \operatorname{RatioCut}\left(A_{1}, \ldots, A_{k}\right):=\frac{1}{2} \sum_{i=1}^{k} \frac{W\left(A_{i}, \bar{A}_{i}\right)}{\left|A_{i}\right|} $$
- Normalized cut:
$$ \operatorname{NCut}\left(A_{1}, \ldots, A_{k}\right):=\frac{1}{2} \sum_{i=1}^{k} \frac{W\left(A_{i}, \bar{A}_{i}\right)}{\operatorname{vol}\left(A_{i}\right)} $$
What is spectral clustering?
See my Blog of Spectral Clustering (in Chinese).
Both ratio-cut and normalized-cut minimizations are NP-hard problems
Spectral clustering is a way to solve relaxed versions of these problems:
(1) The smallest non-null eigenvectors of the unnormalized Laplacian approximate the RatioCut minimization criterion, and
(2) The smallest non-null eigenvectors of the random-walk Laplacian approximate the NCut criterion.
Spectral clustering using the random-walk Laplacian
For details see (von Luxburg ‘07)
Input: Laplacian $\mathbf{L}_{r}$ and the number $k$ of clusters to compute.
Output: Cluster $C_{1}, \ldots, C_{k}$.
(3) Compute W formed with the first $k$ eigenvectors of the random-walk Laplacian.
(2) Determine the spectral embedding $\mathbf{Y}=\mathbf{W}^{\top}$
(3) Cluster the columns $\boldsymbol{y}_{j}, j=1, \ldots, n$ into $k$ clusters using the K-means algorithm.
K-means clustering
See Bishop'2006 (pages 424-428) for more details.
What is a cluster: a group of points whose inter-point distance are small compared to distances to points outside the cluster.
Cluster centers: $\boldsymbol{\mu}_{1}, \ldots, \boldsymbol{\mu}_{k}$.
Goal: find an assignment of points to clusters as well as a set of vectors $\mu_{i}$.
Notations: For each point $\boldsymbol{y}_{j}$ there is a binary indicator variable $r_{j i} \in{0,1}$.
Objective: minimize the following distorsion measure:
$$ J=\sum_{j=1}^{n} \sum_{i=1}^{k} r_{j i}\left|\left|\boldsymbol{y}_{j}-\boldsymbol{\mu}_{i}\right|\right|^{2} $$
The K-means algorithm
(1) Initialization: Choose initial values for $\boldsymbol{\mu}_{1}, \ldots, \boldsymbol{\mu}_{k}$.
(2) First step: Assign the $j$-th point to the closest cluster center:
$$ r_{j i}= \begin{cases}1 & \text { if } i=\arg \min_{l}\left|\left|\boldsymbol{y}_{j}-\mu_{l}\right|\right|^{2} \\ 0 & \text { otherwise }\end{cases} $$
(3) Second Step: Minimize $J$ to estimate the cluster centers:
$$ \boldsymbol{\mu}_{i}=\frac{\sum_{j=1}^{n} r_{j i} \boldsymbol{y}_{j}}{\sum_{j=1}^{n} r_{j i}} $$
(4) Convergence: Repeat until no more change in the assignments.
The Laplacian and the Rayleigh quotient
As usual, for a graph $G=(V, E)$, let $A$ be its adjacency matrix and $D$ be the diagonal matrix with $D(v, v)=d_{v}$. Then, the random walk on $G$ will be taken according to the transition matrix $P=D^{-1} A$. We also define the stationary distribution $\pi$ with $\pi(x)=d_{x} / \operatorname{vol} G$.
Our discussion of random walks on $G$ left off with the result
$$ \left|\left|f P^{t}-\pi\right|\right|_{2} \leq \max_{i \neq 0}\left|\rho_{i}\right|^{t} \frac{\max_{x} \sqrt{d_{x}}}{\min_{y} \sqrt{d_{y}}} $$
where $f$ is a probability distribution (i.e. $f \geq 0$ and $\sum_{x} f(x)=1$ ) and $1=\rho_{0} \geq \rho_{1} \geq \ldots \geq \rho_{n-1}$ are the eigenvalues of $P$. This inequality implies that convergence to the stationary distribution $\pi$ will follow if $\max \left\{\left|\rho_{1}\right|,\left|\rho_{n-1}\right|\right\}<1$.
The transition probability matrix $P$ is similar to the matrix $M=D^{\frac{1}{2}} P D^{-\frac{1}{2}}$, so $P$ and $M$ have the same eigenvalues. We previously introduced the Laplacian of the graph as $\mathcal{L}=I-M$, so it has eigenvalues $0=\lambda_{0} \leq \lambda_{1} \leq \ldots \leq \lambda_{n-1}$ (where $\lambda_{i}=1-\rho_{i}$ ).
The main tool we’ll use to study the spectrum of $\mathcal{L}$ is the Rayleigh quotient $R(f)$ of $\mathcal{L}$, defined (for our purposes) as
$$ R(f)=\frac{f L f^{*}}{f D f^{*}} $$
where $L=D-A$ is the combinatorial Laplacian. This is the same as the usual sense of the Rayleigh quotient $g \mathcal{L} g^{*} / g g^{*}$ with the subtitution $f=g D^{-\frac{1}{2}}$. Following this equivalence, if the $\phi_{i}$ are the eigenvectors of $\mathcal{L}$, we’ll call the $\psi_{i}=\phi_{i} D^{-\frac{1}{2}}$ the harmonic eigenvectors of $\mathcal{L}$.
Employing the Rayleigh quotient, we see that the eigenvalue $\lambda_{1}$ can be written as
$$ \lambda_{1}=\inf_{\substack{f \\ \sum_{x} f(x) d_{x}=0}} R(f) . $$
Since the eigenvector associated with $\lambda_{0}$ is $\phi_{0}=1 D^{\frac{1}{2}}$, the condition $\sum_{x} f(x) d_{x}=0$ is an orthogonality condition. Such variational characterizations can also be made for the other eigenvalues:
$$ \lambda_{n-1}=\sup _{f} R(f) $$
and, in general, $$ \lambda_{i}=\sup_{h_{0}, h_{1}, \ldots, h_{i-1}} \inf_{\substack{f: \\ \sum_{x} f(x) h_{j}(x) d_{x}=0 \\ \forall j \in{0, \ldots, i-1}}} R(f) $$ The following characterization of the Rayleigh quotient (demonstrated last time) will be useful later: $$ R(f)=\frac{\sum_{x \sim y}(f(x)-f(y))^{2}}{\sum_{x} f^{2}(x) d_{x}} . $$
To this point, we have done a lot of linear algebra. We are not here to teach linear algebra; we are here to take linear algebra one step further to understand what is happening in the graph.
The Cheeger Ratio and The Cheeger Constant
In many areas of mathematics the questions of “best” comes into play. What is the best bound for a given constant? What is the best way of row reducing a certain matrix? In this section, we will describe a way to make the “best possible cut” of a graph $G=(V, E)$, where a cut may be either an edge-cut or a vertex-cut, and this cut will split $G$ into two disconnected pieces.
We would like a way to measure the quality of a cut that is made to $G$. That is, would it be better to cut 4 edges which cause us to lose 20 vertices, or is it better to cut 10 edges which would result in the removal of 120 vetices?
Suppose we are given a graph $G=(V, E)$ and a subset $S \subseteq V$. We wish to define the folling two sets:
$$ \partial S={{u, v} \mid u \in S, v \notin S} $$
and
$$ \delta S={v \notin S \mid v \sim u, u \in S} . $$
Definition 1 For any vertex set $W$, the volume of $W$ is given by
$$ \operatorname{vol}(W)=\sum_{x W} d_{x}, $$
where $d_{x}$ is the degree of $\mathrm{x}$ in $W$.
Definition 2 The Cheerger Ratio for a vertex set $S$ is
$$ h(S)=\frac{|\partial S|}{\min {\operatorname{vol}(S), \operatorname{vol}(\bar{S})}}, $$
where $\bar{S}=V-S$.
It is first worth noting that in terms of this defintion of the Cheeger ratio, we are gauging the quality of our cut by taking a measure of what’s been cut off of $G$. There are other forms of the Cheeger ratio as well. For example, we can use $|\delta S|$ instead of $|\partial S|,|S|($ or $\bar{S})$ instead of $\operatorname{vol}(S)$ (or $\operatorname{vol}(\bar{S}))$, or $|S||\bar{S}|$ instead of $\min {\operatorname{vol}(S), \operatorname{vol}(\bar{S})}$.
Definition 3 For any graph $G=(V, E)$, the Cheeger Constant of $G$ is given by
$$ h_{G}=\min_S h(S) . $$
Now, if we consider the case where $\operatorname{vol}(S) \leq \frac{1}{2} \operatorname{vol}(G)$, then we can see that
$$ |\partial S| \geq h_{G}(\operatorname{vol}(S)) . $$
The Cheeger Inequality
Given a graph $G$, we can define $\lambda_{1}$ to be the first nontrivial eignevalue of the Laplacian, $\mathcal{L}$, of $G$.
For any graph $G$,
$$ 2 h_{G} \geq \lambda_{1} \geq \frac{h_{G}^{2}}{2} $$
Reference
https://csustan.csustan.edu/~tom/Clustering/GraphLaplacian-tutorial.pdf
https://mathweb.ucsd.edu/~fan/teach/262/notes/paul/10_5_notes.pdf
https://mathweb.ucsd.edu/~fan/teach/262/notes/paul/10_5_notes.pdf