$ \def\R{\mathbb{R}} \def\Real{\mathbb{R}} \def\X{\mathbb{X}} \def\Y{\mathbb{Y}} \def\N{\mathbb{N}} \def\Z{\mathbb{Z}} \def\e{\varepsilon} \def\h{\mathsf{H}} \def\F{\mathbb{F}} \def\dgm{\mathrm{Dgm}} \def\pers{\mathrm{pers}} \def\Pers{\mathrm{Pers}} \def\lip{\mathrm{Lip}} \def\amp{\mathrm{Amp}} \def\meas{\mathcal{P}} \def\Fr{Fr\'echet } \def\Cech{{\rm Čech}} \def\Rips{{\rm Rips}} \def\nodes{\mathbf{N}} \def\image{\mathtt{Im}\,} \def\ker{\mathtt{Ker}\,} \def\ph{\mathbf{P}H} \def\bbr{\mathtt{Br}} \def\bm{\mathtt{B}} \def\bmv{\bm^v} \def\phpp{\mathbf{b}} \def\dimph{\mathrm{dim}_{\ph}} \def\class{\mathcal{F}} $

topology of the point clouds

how do we capture the shape of a point cloud?

Looks like a sphere!
another one!

sublevel sets

It is quite clear that to eyeball the shape of the point cloud one needs to smudge the point - e.g. turn them into balls. Inevitably, the question of scale arises.



To recover (or at least to compute the topological invariants) of the model, i.e. topological space underlying the observed point cloud, one needs to

  • Make sure that the topology of the union of the balls is close to the topology of the underlying topological space;
  • Provide an efficient procedure of computing topological invariants of the union of balls;
  • Solve the issue of scale.

simplicial approximation

We start with the second task: how to compute topological invariants of a union of balls. Computationally this would be a nightmare. As usual, topology works by turning our continuous objects

into something discrete...

(hopefully preserving topological invariants).

Čech complex

Definition. Let $\{L_k\subset \R^d\}_{k\in V}$ be a finite collection of convex sets (for example, balls around a point cloud). The Čech complex of this collection consists of simplices corresponding to non-empty intersections of the sets $L_v$. If the sets are $\alpha$-balls around a point cloud $\{x_k\}_{k\in V}$, the Čech complex $\Cech(\alpha)$, contains a simplex $\sigma \subseteq V$ iff there exists a point $x \in \R^d$ such that $\|x-x_k\| \leq \alpha$, for all vertices $k \in \sigma$.
Nerve Lemma: $$ \Cech(\alpha) \simeq \bigcup_{\ell\in L}{B_{\ell}(\alpha)} $$
$$ \Huge{\simeq} $$

Vietoris-Rips complex

Computing Čech complex might be difficult, especially in high dimensions: one needs to understand the geometry of the sets $L_k$, or find the witnesses, the points $x\in\Real^d$ such that an $\alpha$-ball around $x$ contains the points of the sample forming a simplex. A somewhat more robust procedure relying only on the pairwise distances is given by the following

Definition. Let $L\subset \R^d$ be a finite set of points. For any real number $\alpha \geq 0$, the Vietoris-Rips complex of $L$, $\Rips(\alpha)$, contains all simplices $\sigma \subseteq L$, whose edges belong to $\Cech(\alpha)$. Clearly, $\Cech(\alpha)\subset\Rips(\alpha)$.
Vietoris-Rips Lemma

Let $L$ be a finite set of points in $\R^d$, and let $\alpha\geq 0$. Then $\Rips(\alpha)\subset\Cech(\sqrt{2}\alpha)$

$$ \Huge{\not\simeq} $$

shapes from approximations

Now that we saw that it is safe to replace the union of balls with its Čech or, sometimes, Vietoris-Rips complex, we turn to the question on how well the unions of balls approximate the underlying topological space.
In most cases, the point cloud we are traying to analyze is a sample from some topological space, typically assumed a manifold.


There are several theorems which tell us that under some conditions the union of balls of a particular radius centered at the sample points capture the homotopy type of the manifold.

CCSL, NSW etc

The results ensuring the homotopy equivalence of an offset of an approximation to a compact subset $K\subset \Real^d$ to $K$ depend primarily on the degree of smoothness of $K$: the current strongest result is due to Chazal, Cohen-Steiner, Lieutier and works for rather intricate $K$. The (by now) classical result of Niyogi-Smale-Weinberger was formulated for smooth submanifolds in $\Real^d$ in terms of their feature size.

Hausmann-Latschev Theorem

It turns out that Vietoris-Rips complex can capture the homotopy type of the manifold.


Theorem. For a Riemannian manifold $\X$, there exists $\e_0$ such that for any finite subset $Y\subset \X$ the Rips complex $\Rips(\e, Y)$, $\e\leq \e_0$, is homotopy equivalent to $\X$, if the Hausdorff distance between $Y$ and $\X$ is at most $\delta(\e)$.

It is also worth mentioning that the Vietoris-Rips lemma implies that the persistence diagrams for the Čech and Rips complexes will be close.

handling jitter

Now we turn to the problem of scale:

$\X_0$ $\X_1$ $\X_2$ $\X_3$ $\X_4$ $\X_5$ $\X_6$


$\X_0$ - Bunch of connected components

$\X_1$ - A single connected component is left and six 1-cycles are born

$\X_2$ - Five out of six 1-cycles dies and two more are born

$\X_3$ - One more 1-cycle dies

$\X_4$ - One of the oldest 1-cycles dies, one more keeps going

$\X_5$ - Going strong.

$\X_6$ - Everybody's dead

persistent homology

So, we have a family of nested spaces (filtration)

$\subset$ $\subset$ $\subset$ $\subset$ $\subset$ $\subset$
$\X_0$ $\X_1$ $\X_2$ $\X_3$ $\X_4$ $\X_5$ $\X_6$


We compute homology of every space and look at linear maps induced by the inclusion.

$\h_p(\X_0)\to\h_p(\X_1)\to\h_p(\X_2)\to\h_p(\X_3)\to\h_p(\X_4)\to\h_p(\X_5)\to\h_p(\X_6)$



The images and kernels of linear maps $\phi_p^{i,j}:H_p(\mathbb{X}_i)\to H_p(\mathbb{X}_j)$ induced by inclusion reflect the chase of the cycles (chains without boundaries) by the boundaries (of chains of higher dimension). This chase can be abstracted to the dry statistics, when each homology class is born and when it dies.
We get a collection of birth-death pairs.

persistence diagrams

We capture this evolution of homology in a filtration by representing each birth-death pair as a point in the plane.


persistence diagrams

Persistence diagram are the predominant fingerprinting tool used in topological data analysis.
While initially motivated by the recovering the models for point clouds (i.e. level sets of the distance function to the sample), persistence homology can be defined for any function with not too many critical points.

metric on persistence diagrams


To compare two models, we need to be able to compare persistence diagrams.
The bottleneck distance between the dimension-$\ell$ persistence diagrams of $f$ and $g$ is defined as $$ d_B\left(\dgm_{\ell}(f), \dgm_{\ell}(g)\right) = \inf_{\gamma}{\sup_{x\in\dgm_{\ell}(f)}{\|x-\gamma(x)\|_{\infty}}}, $$ where $\gamma$ ranges over all bijections from $\dgm_{\ell}(f)$ to $\dgm_{\ell}(g)$.

Stability Theorem. Let $\mathbb{X}$ be a triangulable space topological, and let $f,g:\mathbb{X}\to \mathbb{R}$ be continuous tame functions. Then for each $\ell$

$$ d_B(\dgm_{\ell}(f), \dgm_{\ell}(g))\le \|f-g\|_{\infty} $$

computational aspects:Čech Complex

Computing a Čech complex filtration for a point cloud is based on the following observation:

A simplex $\sigma$ belongs to $\Cech(\alpha)$ if the radius of the smallest ball enclosing the vertices of $\sigma$ is at most $\alpha$.

This is computationally hard (the number fo simplices explodes). One is often forced to reduction of the sample sizes, or usage of the "witness approximations".

The filtration is in essence a sorting of simplicies by their $\alpha$-values.

Construction of the Čech complex filtration is implemented in Dionysus software ( http://mrzv.org/software/dionysus/), and it computes the smallest enclosing ball using the algorithm by Bernd Gärtner.

computational aspects: Vietoris-Rips complex

Computing Vietoris-Rips filtration can be done similar to the Čech complex filtration, but we no longer need the smallest enclosing ball - the value when the simplex appears can be computed by looking at pairwise distances between simplex vertices.

However, since the Vietoris-Rips complex is a flag complex on its 1-skeleton, a more efficient computation can be achieved by adapting the Bron-Kerbosch algorithm for clique computation.

Construction of the Vietoris-Rips complex filtration using Bron-Kerbosch algorithm is implemented in Dionysus software( http://mrzv.org/software/dionysus/).

There is another implementation in the javaPlex software package ( http://javaplex.github.io/javaplex/).

computational aspects: persistence

We assume that we have simplices ordered by dimension and the time they appear in the filtration. Computing persistence paring is essentially a special case of Gaussian elimination.

$a$$b$$c$$d$$ab$$bc$$cd$$ad$$ac$$abc$$acd$
$a$00001001100
$b$00001100000
$c$00000110100
$d$00000011000
$ab$00000000010
$bc$00000000010
$cd$00000000001
$ad$00000000001
$ac$00000000011
$abc$00000000000
$acd$00000000000
$a$$b$$c$$d$$ab$$bc$$cd$$ad$$ac$$abc$$acd$
$a$00001001100
$b$00001100000
$c$00000110100
$d$00000011000
$ab$00000000010
$bc$00000000010
$cd$00000000001
$ad$00000000001
$ac$00000000011
$abc$00000000000
$acd$00000000000
$a$$b$$c$$d$$ab$$bc$$cd$$ad$$ac$$abc$$acd$
$a$00001001100
$b$00001100000
$c$00000111100
$d$00000010000
$ab$00000000010
$bc$00000000010
$cd$00000000001
$ad$00000000001
$ac$00000000011
$abc$00000000000
$acd$00000000000
$a$$b$$c$$d$$ab$$bc$$cd$$ad$$ac$$abc$$acd$
$a$00001001100
$b$00001101000
$c$00000110100
$d$00000010000
$ab$00000000010
$bc$00000000010
$cd$00000000001
$ad$00000000001
$ac$00000000011
$abc$00000000000
$acd$00000000000
$a$$b$$c$$d$$ab$$bc$$cd$$ad$$ac$$abc$$acd$
$a$00001000100
$b$00001100000
$c$00000110100
$d$00000010000
$ab$00000000010
$bc$00000000010
$cd$00000000001
$ad$00000000001
$ac$00000000011
$abc$00000000000
$acd$00000000000
$a$$b$$c$$d$$ab$$bc$$cd$$ad$$ac$$abc$$acd$
$a$00001000100
$b$00001100100
$c$00000110000
$d$00000010000
$ab$00000000010
$bc$00000000010
$cd$00000000001
$ad$00000000001
$ac$00000000011
$abc$00000000000
$acd$00000000000
$a$$b$$c$$d$$ab$$bc$$cd$$ad$$ac$$abc$$acd$
$a$00001000000
$b$00001100000
$c$00000110000
$d$00000010000
$ab$00000000010
$bc$00000000010
$cd$00000000001
$ad$00000000001
$ac$00000000011
$abc$00000000000
$acd$00000000000
$a$$b$$c$$d$$ab$$bc$$cd$$ad$$ac$$abc$$acd$
$a$00001000000
$b$00001100000
$c$00000110000
$d$00000010000
$ab$00000000011
$bc$00000000011
$cd$00000000001
$ad$00000000001
$ac$00000000010
$abc$00000000000
$acd$00000000000
Both Dionysus ( http://mrzv.org/software/dionysus/) and javaPlex ( http://javaplex.github.io/javaplex/) implement computation of persistence pairing.

The notion of persistent homology is useful for other functions, for effective de-noising.

dimensionality reduction and manifold learning

Manifold learning and non-linear dimensionality reduction have significant conceptual and methodological overlap with topological data analysis.
Here, one starts with some data, often without an underlying geometric model, and searches for an embedding into a Euclidean space of reasonable dimension to proceed.

$\to$


Examples:

  • Point clouds in very large-dimensional spaces
  • Point clouds in spaces with categorical attributes
  • Networks

One can think of two ways to describe manifolds: One way (Poincare) views a manifold as given by a system of equations in a Euclidean space:
$$ M=\left\{ \begin{array}{rl} f_1(x_1,\ldots,x_n)&=0\\ \cdots&\\ f_m(x_1,\ldots,x_n)&=0 \end{array}\right. $$
Alternative (common nowadays) definition of a manifold is the one with charts and maps (Riemann)

Most of the dimensionality reduction and unsupervised learning can be rendered in terms of Poincare-Riemann paradigm

In the Poincare world, one starts with a point cloud in a Euclidean (or Banach) space,

$$ M\hookrightarrow\Real^D, $$
and aims mostly at making $d$ smaller, to reveal the structure of $M$.
One classical research thread in this vein stems from the famous Johnson-Lindenstrauss result:

Given a finite set $M\subset \Real^D$ of $m$ points, and $\epsilon\gt 0$, there exists an orthogonal projection $p:\Real^D\to\Real^d$ such that $$ 1-\epsilon\leq \frac{|p(x_i)-p(x_j)| }{ |x_i-x_j| }\leq 1+\epsilon $$ as long as $d\gt{\log{|M|}}/{\epsilon^2}$.

(Of course, there is no topological information that is captured/preserved here, but conceptually this result fits.)
One should mention that similar results exist for manifolds $M$.

In all these developments, following up on the Johnson-Lindenstrauss result, one heavily relies on random projections and concentration of measure results, in essence providing the low-distortion sketches of some isometric (or almost isometric) embedding into a Euclidean space.
In other words, in this situation one assumes that a good model for the data can be recovered from the linear functions in the ambient space.
The most prominent (and perhaps most widely used tool of data analysis) is the PCA.

There are numerous versions of the PCA generalizing to the non-linear functions of the ambient space. In other words, in lieu of
$$ M\hookrightarrow\Real^D\to \Real^d, $$ one considers
$$ M\hookrightarrow\Real^D\color{red}{\to\Real^\bf{D}}\to \Real^d. $$
Nonlinear PCA, Kernel PCA are the examples of such approach. The common feature of PCA and its versions is that the embedding of $M$ into a low-dimensional space is sought among the functions of the ambient space $\Real^D$ where $M$ resides.

A very different class of manifold learning approaches discards completely with the ambient space: no functions of the coordinates are used to derive an embedding $M\hookrightarrow \Real^d$.
Rather, one adopts the viewpoint (persistent in the TDA) that only the proximity data amongst the points are relevant, and the (global) coordinate functions functions should be deduced from these local data.
One can see similarities with the Riemann's definition of a manifold.
One significant advantage of this class of method is that they are well adapted to the raw data which are networks, or other discrete structures.

We list here a few of the popular tools to generate global coordinates from the local data. We can assume that the input to these tools is always a weighted graph $$ M=(V,E,w): E\subset V^2, w:E\to\Real_+, $$ derived, perhaps from an embedding into $\Real^D$.

  • In Isomap, one computes the path metric $d_M$ on $M$ (using the standard Dijkstra algorithm) and deploys the multidimensional scaling, i.e. an embedding of $M$ into $\Real^d$ with the distance function in $\Real^d$ approximating $d_M$ optimally.
    Conceptually, this method corresponds to the fact that any bounded metric space $M$ can be embedded into $l_\infty(M)$ by $$ x\mapsto d_M(x,\cdot). $$
  • Eigenmap involves construction of a Laplace operator on $M$ from the weights $w$, and proceeds by taking as coordinate functions the low harmonics of the corresponding eigen-decomposition of $l_2(M)$.
  • LLE, Locally Linear Embedding, stand a bit aside, as it does use affine structure of the ambient space. One aims at an embedding of $M\hookrightarrow \Real^d$ distorting these affine structures as little as possible.

REFERENCES

NEXT PART