$\def\conf{\mathtt{conf}} \def\cov{\mathtt{cov}} \def\Int{\mathbb{Z}} \def\Real{\Bbb{R}} \def\Comp{\mathbb{C}} \def\ZN{\Int_N} \def\obst{{\mathcal{O}}} \def\dip{{\mathtt{P}}\vec{\mathtt{ath}}} \def\trans{\mathtt{Trans}} \def\U{{\mathcal U}} \def\Ed{{\mathcal E}} \def\Rd{{\Real}^d} \def\cN{{\mathbf N}} \def\OO{{\mathcal O}} \def\DD{{\mathcal D}} \def\HH{{\mathcal H}} \def\NN{{\mathcal N}} \def\VV{{\mathcal V}} \def\domain{{\mathcal{B}}} \def\ii{{\mathbf i}} \def\T{{\mathbb T}} \def\S{{\mathbb S}} \def\mod{{\mathit mod}} \def\Cycle{{\mathbf \sigma}} \def\prob{{\mathbb P}} \def\ex{{\mathbb E}} \def\codim{{\mathit{codim}}} \def\paths{{\mathcal P}} \def\sens{{\mathcal S}} \def\measurements{{\mathcal E}} \def\indep{{\mathcal I}} \def\one{\mathbb{I}} \def\dim{\mathtt{dim}} \def\nset{\mathbf{n}} \def\mset{\mathbf{m}} \def\Nat{\mathbb{N}} \def\types{\mathfrak{F}} \def\ideal{\mathcal{I}} \def\bz{\bar{z}} \def\bn{\bar{n}} \def\arrange{\mathcal{A}} \def\fld{\Bbbk} \def\attr{\mathfrak{A}} \def\circle{\mathbf{C}} \def\bx{\mathbf{x}} \def\br{\mathbf{r}} \def\tr{\mathtt{tr}} \def\S{\mathbb{S}} \def\ex{\mathbb{E}} \def\diag{\mathcal{D}} \def\noise{{\boldsymbol{\epsilon}}} \def\ex{\mathbb{E}} \def\var{\mathbb{V}} \def\coom{{\sc COOM}} \def\icl{\mathbf{T}} \def\iclm{{\mathcal{T}}} \def\itint{I}%\mathbf{I}} \def\s1{\mathbb{S}^1} \def\curve{\boldsymbol{\gamma}} \def\brm{\mathbb{B}} \def\mtree{\mathbf{T}} $

Topological Data Analysis in Dimension One

yuliy baryshnikov

University of Illinois at Urbana-Champaign
mathematics and ECE

supported by NSF and ARO

Reparamterization Invariant Methods

Topological Data Analysis in dimension one does not address the standard question of TDA, - what is the structure of the underlying space (model).
It rather aims at extracting the properties of the functions one obesrves on the space that are invariant under topology-preserving transformations: in dimension one, just the monotonous coordinate changes.

We will consider two independent models:

  • the merge trees associated to the time series, and
  • cyclic order recovery using iterated integrals.

Merge Trees

For a path-connected $X$ and a continuous real-valued $f$ function on $X$, define the Merge Tree $\mtree_f$ of $f$ as the topological space which as a set is \[ \mtree_f:=\amalg_{s} (X_s/\sim)\times \{s\}, \] where $\sim$ is the relation of being within the same path-connected component (so that $X_s/\sim$ is the set of the connected components of the sublevel set $X_s$).
The merge tree is equipped with the roughest topology such that the function taking the class of $(x,s)$ to $s$ is continuous.
In dimension one the merge trees need a bit of care (one first augments $f$ to a function of two variables, $F:=f(x)+y^2$, then proceed as above).
Or one can refer to the classical Harris (Dyck, etc) construction:
Most important fact about merge trees for univariate function is that
merge trees form a complete invariant of a continuous function, up to reparametrization.

Persistent Homology

Other popular set of reparametrization invariant features of an $\Real$-valued function are Persistent Homologies.

Collection of embeddings $X_s\hookrightarrow X_t, s\lt t$ (here $X_s=\{f\leq s\}$) lifts to a collection of consistent homomorphisms $H_km_{st}:H_k(s)\to H_k(t), k=0,1,\ldots$. These homomorphisms decompose in a natural way (pairing births and deaths of transient cycles).

We will identify persistence diagram with the sum of delta-functions weighted by the dimension of the bar. Resulting counting measure is a convenient tool to describe properties of the persistence diagram.

In general, the total mass of the persistence measures $PH_k$ can be infinite, - and is for generic Lipschitz or Holder functions on triangulable spaces with length metric (BW)

However, the contents of the displaced quadrants $\{b\leq b_*, d\geq d_*\}$ (i.e. the numbers of bars straddling the interval $[b_*,d_*]$) are finite for all $b_*\lt d_*$ for the so-called tame filtrations.

Continuous functions on compact metric spaces are tame, with Lipschitz or Holder functions having some tameness guarantees.

Persistent Homology in 1D: Merge Trees Etc

Persistence diagrams can be reconstructed from the merge tree using the following recursive procedure, relying on the "Elder Rule"
  • Find the longest increasing path, the stem, in the merge tree (that's the longest bar).
  • Remove the stem. A bunch of subtrees emerge.
  • Iterate the procedure on each of the resulting trees recursively.
The resulting pile of stems are the bars of $PH_0$.
The decomposition of a tree into a pile of stems erases a lot of information about the original function: the parentage relationships between stems are lost.
This is one source of ambiguity if one attempts to reconstruct the merge tree from its persistence diagram (pile of stems).

Another source of ambiguity (for univariate functions) is the fact that a stem can be attached to its parent on the right or on the left side.

Same bar, different chirality.

Test case: Brownian motion with drift

Consider Brownian motion with a drift, that is \[ f(t)=\brm_o(t)+mt. \] Almost surely, there is unique bar $(b_\infty,d_\infty)=(\inf_{t>0} f(t),\infty)$. Other bars are finite, forming to a random persistence diagram $PH_0$. Our interpretation of this persistence diagram is to view it as a {\em point process}, a random measure consisting of sum of $\delta_{(b,d)}$ over all bars $(b,d)$.

Tameness of the this point process follows from the general results that the persistence diagram of any $\alpha$-Holder function in dimension $d$ has PH-dimension of $d/\alpha$.

One gets an explicit expression for the intensity measure $\ex PH_0$ for the point process of zero-dimensional persistence from standard Brownian motion with drift: The intensity measure of $PH_o$ is supported by the halfplane above diagonal $b\lt d$, and is given by the density \[ \mu_{ph_0}=\frac{4m^2e^{2m\Delta}(1+e^{2m\Delta})}{(e^{2m\Delta}-1)^3}, \] where $\Delta=d-b$.

In particular, near the diagonal the density explodes as $1/m(d-b)^3$.

What about chiralities?

Conditioned on the number $k$ of bars straddling an interval $I$, in a trajectory of $\brm+{mt}, m\gt 0$, the expected excess of F over M is equal to the expected number of records in a random $k$-permutation, that is $k$-th harmonic sum, \[ \ex(\mathbf{F}-\mathbf{M})=H_k=1+1/2+\ldots+1/k. \] For Brownian motion with drift $m$ , the expected excess $\ex(\mathbf{F}-\mathbf{M})$ in the bars straddling an interval of length $\Delta$ is \[ -\log(1-\exp(-2m\Delta)). \]

Some Experiments

We looked at some financial data.

As is well known, stock valuations, - perhaps, - cannot be modeled as exponentials of Brownian motions with a drift. If they were, however, the merge tree distribution of the detrended log trajectories would exhibit time reversal invariance. So, Rachel Levanger (at U Penn at the time) analysed hundreds of the stock traces.

Here is a typical output:
Cumulatively, the distributions of the F-over-M excess is noticeable.

It would be interesting to understand what makes a firm more or less F-over-M prone. Many other questions are open, as well...

shifting gears!

Iterated Integrals and Cyclic Phenomena

All periodic phenomena are cyclic, but not vice versa.
The distinction is important. There are plenty of processes that are manifestly cyclic, but not periodic, in the sense that no period \(P\) can be found, such that shift of the process along the time axis by $P$ leaves the process, or its parameters, invariant.

Examples of cyclic yet not periodic processes include cardiac cycle; muscle-skeletal complex movements exercised during a gait; population dynamics in closed ecosystems; business cycles and many others.

Coupled dynamics of moose and wolve populations on Isle Royale

Any cyclic phenomenon has a circle as the underlying state space. We will be referring to any parameter on the circle as the internal clock.

What we observe and measure is however parameterized not by the the internal clocks of the systems we study: the observations are parameterized by the physical time. The internal clocks do not necessarily coincide with the physical time progression, not even after some linear reparametrization.

In general, if the internal clock cannot be represented as a sum of a linear and a periodic function of the time, the cyclic phenomena cannot be made periodic by any reparametrization.

The predominant focus on periodic processes can be explained by the availability of powerful tools exploiting harmonic analysis (Fourier/Cosine transforms, wavelets etc).

These tools are however intrinsically limited the fact that they rely on the structure of the timeline as a homogenous space; without this structure, the natural bases for the decompositions - consisting of characters of the shift group - just do not exist.

Iterated Path Integrals

From now on, we will assume that the range of the observed processes is a $d$-dimensional Euclidean space $V\cong\Real^d$, with coordinates $x_j, j=1,\ldots d$.

Before attacking the specific problem of recovering the cyclic nature of the observed processes, we will address the general question, of reparametrization invariant functionals of trajectories (that is mappings $\curve:I=[0,1]\to V$):

what are the functions of trajectories that do not depend on how one traverses them?

This question is classical, and has an answer in terms of the iterated path integrals, a notion going back to Picard; reintroduced into the modern mathematical practice by K.-T. Chen.

Iterated integrals are functions of the trajectories defined recursively. The linear space of iterateg integrals is filtered by their order.

The integrals of an order $0$ are constants.
The integrals of order $k$ are given by \[ a(\curve)=\sum_l \int_0^1 a_l(\curve([0,t]))d\curve_l(t), \] where all functions $a_l$ are iterated integrals of order $\lt k$.

Equivalently, the space of iterated path integrals (for smooth trajectories) is spanned by \[ a(\curve):=\int\cdots\int_{0\lt t_1\lt\ldots\lt t_k\lt 1} \dot{\curve}_{l_{1}}(t_1)\cdots \dot{\curve}_{l_k}(t_k)dt_1\ldots dt_k. \]

Iterated integrals are reparametrization invariant.

They are also unaffected by detours.
(We say that a path $\curve$ has a detour if it can be represented as \[ \curve_s\curve_d\curve_d^{-1}\curve_f, \] a piece of trajectory traveled back and forth.)

A theorem by K.-T. Chen implies that they form a complete algebra of functions on trajectories modulo reparametrizations and detours:
Two $\Real^d$-valued trajectories (without detours) have same iterated integrals of all orders, if and only if the trajectories coincide up to a reparametrization and shift.

This suggests that any exploratory analysis of parametrization-independent features of time series can rely on these functionals as a rich enough tool.

Over the past few years, the idea to use iterated integrals as a tool in data analysis has been promoted by T. Lyons and his students..

Algebraic Areas

The first order iterated integrals just measure the total increments.
The second order integrals turned out to be very helpful in the problem of recovery of cyclic phenomena.

Recall that second order iterated integrals measure the algebraic areas encompassed by the trajectory.

The space of iterated integral of the second order is spanned by \[ A_{kl}:=\frac{1}{2}\int \curve_l d \curve_k - \curve_k d \curve_l. \] Clearly, these integrals are equal to the oriented areas of the two-dimensional projections of the trajectory $\curve$ on the coordinate planes (with starting and ending points connected by a straight segment).

One of the patterns we aim to capture is the leader-follower relationship.

Plotting the functions shown on the left against each other produces a curve encircling a positive oriented area (on the right).

The key observation is the following: if one effect precedes another, the oriented area on the parametric joint plot of the corresponding functions will surround a positive algebraic area.

Of course, the semantics of this assumes that $k$ and $l$ are consubstantial; if their nature is antagonistic, the lead-follow relation flips.

We take this consideration as our primitive on which to build the procedures of data analysis.

We arrange the iterated integrals \[ A_{kl}:=\frac{1}{2}\int_{t} \curve_l d \curve_k - \curve_k d \curve_l \] into a skew-symmetric matrix $A$.

The skew-symmetric matrix $$ A=(A_{kl})_{1\leq k,l\leq d} $$ is called the lead matrix.

An interpretation of the lead matrix coefficients is that an element $A_{kl}$ is positive if $l$ follows $k$.

Chain of Offsets Model

Specializing, we concentrate on a class of cyclic phenomena described by the chain of offsets model.

Under this model, the observed variables are (noisy and perhaps rescaled versions of) the same function on the circle $\circle$: \[ \curve_l(\tau)=c_l \phi(\tau - \alpha_l), l=1,\ldots,d, \] where $\phi$ is the archetypal behavior, $c_l\gt 0$ some scaling parameters, and $\alpha_l$ are some phases.

Informally, under the chain of offsets model, the observables are retracing the same behavior, with some delays.

Important to emphasize that even assuming circle as the state space model, we rely on explicitly reparametrization invariant methods, making the approach stable with respect to arbitrary changes in time variable.

The central question we will address now is the recovery the cyclic order of the functions $\curve_l, l=1,\ldots,d$, (or, equivalently, the cyclic order of $\alpha_l, l=1,\ldots,d$) from the (perhaps, noisy version) of the trajectory, $\curve+\noise$, the sampled lead matrix.

Expand the primary function $\phi$ (defined on circle $\circle$ parameterized by the internal clock $\icl$) into the Fourier series, \begin{equation} \label{eq:phi} \phi(\tau)=\sum_k c_k \exp(2\pi i k \tau). \end{equation}

Then it is immediate that

The lead matrix over a period $A$ is given by $$ A=\left(\psi(\alpha_k-\alpha_l)\right)_{kl} $$ where \[ \psi(t)=2\pi\sum_m m|c_m|^2\sin(mt). \]

Statistical Interlude

To be able to use the lead matrix in applications, one needs some statistical guarantees of its recoverability from noisy observation, \[ \curve^\noise(\tau)=\curve(\tau)+\noise(\tau). \]

A naive procedure of taking the sampled lead matrix with coefficients \[ \left(A(\curve^\noise_{a,b})\right)_{kl}:=\frac12\int_a^b \curve^\noise_k(t) d\curve^\noise_l(t)-\curve^\noise_l(t)d \curve^\noise_k(t) \] and averaging over time might easily fail.

The reason is, essentially, the linear growth of the sample algebraic areas for many models of noise (such as Brownian loops):

Levy algebraic area formula \[ p(a)=\frac{\pi}{2T}\cosh^{-2}(a/T) \] shows that the algebraic area for Brownian loops grows linearly in time, just as the algebraic areas of a periodic process does.

Still, a modification of the procedure can overcome this difficulty.

If the noise process is time reversible (sufficient, not necessary) and its algebraic area process is ergodic enough (say, has exponential decay of correlations), then partitioned sampled lead matrix converges to its noiseless version.

Let $l_1, l_2, l_3,\ldots$ a sequence of the interval lengths, such that $l_k\to\infty; l_k/L_k=0$, where $L_k=\sum_{j\leq k} l_j$. Define \[ \bar{A}^\noise(k)=\frac{1}{L_k}\sum_{j\leq k} A(\curve^\noise_{L_{j-1},L_j}). \] Then the normalized sampled lead matrices converge to the normalized lead matrix over one period, \[ \bar{A}^\noise(k)/\parallel \bar{A}^\noise(k)\parallel \to A^p/\parallel A^p\parallel \]

This law of large numbers proves the consistency of the partitioned sample lead matrices estimator in our model. We will be using therefore the lead matrices derived from the observed traces as an approximation to actual ones.

Lead Matrices and Their Spectra

A natural simplifying assumption, not implausible in applications, is that the underlying signal is dominated by the leading coefficient: $|c_1|^2$ is much larger than $\sum_{|k|\geq 2} |c_k|^2$ (we ignore the constant term, as it is irrelevant to the cyclic behavior).

If $\phi$ is just a single harmonic, then the skew symmetric matrix $A$ would have rank two: spanned by the two rank one matrices, $A^\pm$, with $$ A^+_{kl}=a_ka_l \sin(\alpha_k)\cos(\alpha_l) $$ and $$ A^-_{kl}=a_ka_l \cos(\alpha_k)\sin(\alpha_l). $$

Even if the function $\phi$ is not harmonic, and the observations are noisy, one still can expect, that if the sampled lead matrix approximates the one period lead matrix well, and the leading harmonic is dominating $\phi$, then $\bar{A}$ is well approximated (in Frobenius norm) by the rank 2 matrix $P\bar{A}P$.
(Here $P$ is the projection on the 2-plane spanned by the eigenvectors corresponding to the largest in absolute value purely imaginary eigenvalues.)

One can prove that the components of the nontrivial eigenvectors of this rank 2 matrix are given by a (real linear) transformation of \[ v_1=e^{i\alpha}\left(e^{2\pi i\alpha_1},e^{2\pi i\alpha_2},\ldots,e^{2\pi i\alpha_d}\right), v_2=\overline{v}_1 \] (for some phase $\alpha$).

In other words, the arguments of the components of the leading eigenvectors of the lead matrix reconstruct the cyclic order of the components.

The spectrum of the sampled lead matrix by itself can serve as an indicator of the explanatory power of the cyclicity algorithm.

In general, the spectrum of the skew-symmetric lead matrix consists of purely imaginary pairs $\pm i\lambda_j$, and zeros. We assume that the absolute values of the spectral pairs are ordered, $$ \lambda_1=\lambda_2\geq \lambda_3\ldots $$ The ratio $\lambda_1/\lambda_3$ indicates how well the lead matrix can be described by the noiseless harmonic model (for which the ratio is $+\infty$).

Example: Simulated Data

We combined the ideas outlined above into a pipe, resulting in a software tool for recovering the cyclic structure among the observed time series, under assumption of the chain of offsets model.

The workflow consists of

  • Normalizing the data (and partitioning, if highly correlated noise is suspected);
  • Forming the sampled lead matrix;
  • Spectral analysis of the lead matrix (dominance of the leading eigenvalue an indicator of the validity of the chain of offsets model);
  • Recovery of the cyclic order from the cyclic order of coefficients of the leading eigenvector.

Some examples follow.

We generated $d=12$ traces of harmonic functions $\sin(t-\alpha_k)$ with random phase offsets (cyclically ordered), and sampled them, at $625$ samples per period for $16$ periods. We can see the banded, nearly cyclic lead matrix, the spectrum having just two nontrivial (complex conjugate) eigenvalues, and the components of the leading eigenvector close to the circle of radius $1/\sqrt{d}$.

Next, consider a similar example, adding to each of the samples an iid normally distributed value with SNR $\sigma\approx1.42$

The recovery of the cyclic ordering is almost perfect.

Example: Business Activity Cycles

The US economy goes through highly aperiodic cycles of business activity. At the aggregate level, the key indicator is the GDP growth, but many correlates of the business activity exist, most notably, the unemployment data, bonds returns etc.

The data are notoriously noisy.

Investors long sought to exploit the return dependences of the stock returns on the stages of the business cycle.

We considered the sectorial indices (as reported by Standard and Poor) for the daily closings, for the past ten years.

These are the aggregates corresponding to the subsets of stock values tracked by S&P as a part of their industrial portfolio.

It is "well known" that these industrial sectors exhibit some kind of cyclic behavior (some showing better returns at the start of a business cycle; others - towards its end).

Here are the standard definition of the industries:

  • BM Materials
  • CY Consumer Cyclical
  • EN Energy
  • FN Financial
  • HC Health Care
  • IN Industrials
  • NC Consumer Staples (Non-Cyclical)
  • TC Technology
  • TL Telecommunications
  • UT Utilities

Assuming the chain of offset model, we analyzed 10 time series of the industrial sectors. The data were downloaded from finance.google.com, $\log$-ed, detrended and normalized. The results are shown below.

The resulting (cyclic) order, shown below, is in a close agreement with the financial "folklore knowledge":

Cyclicity Fidelity

Example: Human Connectome Project Data

We used the fMRI data supplied by the Human Connectome Project, a large consortium of research centers, based at the Washington University in St. Louis, aiming at aggregating high quality annotated data. In this study, we pulled 200 fMRI traces of resting subjects.

The subjects were scanned using a customized Siemens MAGNETOM Skyra 3T at Washington University in St. Louis Missouri using the following parameters

  • Repetition time (TR): 720 ms
  • Voxel dimension:$2\times 2\times 2$ mm
  • Frames per run: 1200 (total run duration = 14m33s)

For each of the subjects, the data were then aggregated into 33 voxel groups, the ROIs, corresponding to anatomically identified regions of the brain, often with well-defined functions associated.

For each of the subjects, our cyclicity analysis was run, yielding a collection of lead matrices ($33\times 33$ skew-symmetric matrices); the implied cyclic permutations and and vectors of the amplitudes of the components of the leading eigenvalues, serving as a proxy for the signal strength.

Predictive power of the leading eigenvector in describing the cyclic structure of the ROIs in the time series can be deduced, heuristically, from the ratio of the absolute value of the leading eigenvalue over the larges of the absolute values of the remaining components: if the reordered absolute values are $|\lambda_1|=|\lambda_2|>|\lambda_3|=|\lambda_4|>\ldots$, then the quality of the ordering defined by the components of $u_1$ is $|\lambda_1|/|\lambda_3|$.

Two representative sets of results (out of 200) are shown here. A subject with a high $|\lambda_1/\lambda_3|$ ratio and, correspondingly, with strongly banded lead matrix after reordering the ROIs according to their phases in the leading eigenvector.

The results of another subject, shown below, are far less clear, with relatively low $\lambda_1/\lambda_3$ ratio and essentially absent banded structure of the lead matrix in the reordered basis.

Still, the majority of subjects showed a large $\lambda_1/\lambda_3$ ratio.

Our goal in this study was to understand whether there exist some common cyclic process, a consciousness pattern, that can be detected in a large number of the subjects.

We identified a group of ROIs consistently exhibiting high signal. These results were then used to generate the group of 10 most pronounced ROIs:

  • l cuneus
  • l primary visual cortex
  • l primary auditory cortex
  • precuneus
  • r primary visual cortex
  • r cuneus
  • l ventral intraparietal sulcus
  • l posterior intraparietal sulcus
  • r ventral intraparietal sulcus
  • r posterior intraparietal sulcus

A striking pattern of the data is the total dominance of the left primary auditory cortex in as compared to the right primary auditory cortex which hardly shows up at all. Other symmetric regions appear in pairs so this contrast is indicative of strong asymmetry either in the functionality (implausible) or in stimuli.

The lead matrix provides an insight into the temporal, cyclic behavior of the interrelated time series. The patterns of cyclic behavior that are observed universally, or which are prevalent in a population are of especial interest. Again, the left primary auditory cortex is a dominant ROI, represented in most of the frequent triples.

These results strongly suggest the triggering of what is assumed to be default network manifestations by an external stimulus.

Example: Objective Biomarkers of Tinnitus

Project with BJ Zimmerman, I Abraham, SA Schmidt, FT Husain, Network Neuroscience, 2018.