$\def\conf{\mathtt{conf}} \def\cov{\mathtt{cov}} \def\Int{\mathbb{Z}} \def\Real{\Bbb{R}} \def\Comp{\mathbb{C}} \def\ZN{\Int_N} \def\obst{{\mathcal{O}}} \def\dip{{\mathtt{P}}\vec{\mathtt{ath}}} \def\trans{\mathtt{Trans}} \def\U{{\mathcal U}} \def\Ed{{\mathcal E}} \def\Rd{{\Real}^d} \def\cN{{\mathbf N}} \def\OO{{\mathcal O}} \def\DD{{\mathcal D}} \def\HH{{\mathcal H}} \def\NN{{\mathcal N}} \def\VV{{\mathcal V}} \def\domain{{\mathcal{B}}} \def\ii{{\mathbf i}} \def\T{{\mathbb T}} \def\S{{\mathbb S}} \def\mod{{\mathit mod}} \def\Cycle{{\mathbf \sigma}} \def\prob{{\mathbb P}} \def\ex{{\mathbb E}} \def\codim{{\mathit{codim}}} \def\paths{{\mathcal P}} \def\sens{{\mathcal S}} \def\measurements{{\mathcal E}} \def\indep{{\mathcal I}} \def\one{\mathbb{I}} \def\dim{\mathtt{dim}} \def\nset{\mathbf{n}} \def\mset{\mathbf{m}} \def\Nat{\mathbb{N}} \def\types{\mathfrak{F}} \def\ideal{\mathcal{I}} \def\bz{\bar{z}} \def\bn{\bar{n}} \def\arrange{\mathcal{A}} \def\fld{\Bbbk} \def\attr{\mathfrak{A}} \def\circle{\mathbf{C}} \def\bx{\mathbf{x}} \def\br{\mathbf{r}} \def\tr{\mathtt{tr}} \def\S{\mathbb{S}} \def\ex{\mathbb{E}} \def\diag{\mathcal{D}} \def\noise{{\boldsymbol{\epsilon}}} \def\ex{\mathbb{E}} \def\var{\mathbb{V}} \def\coom{{\sc COOM}} \def\icl{\mathbf{T}} \def\iclm{{\mathcal{T}}} \def\itint{I}%\mathbf{I}} \def\s1{\mathbb{S}^1} \def\curve{\boldsymbol{\gamma}}$

### Introduction

Consider some function of time, $\mathbf{x}:T\to\mathbb{R}^d$, describing a certain phenomenon. What this function tells us, if we are very uncertain about the time parametrization?

In other words, we are forced to assume that our time readings are a corrupted, reparameterized version of the actual clock, perhaps even with backtracking: $\mathbf{x}(t)=\mathbf{X}(\tau(t)), \tau:\mathbb{R}\to\mathbb{R}.$
Under such assumptions, what can we learn about the phenomenon?

The problem of finding reparametrization invariant descriptors of time series emerged from a very practical observation:

All periodic phenomena are cyclic, but not vice versa.

The distinction is important. There are plenty of processes that are manifestly cyclic, but not periodic, in the sense that no period $P$ can be found, such that the shift of the process along the time axis by $P$ leaves the process, or its parameters, invariant.

Examples of cyclic yet not periodic processes include cardiac cycle; muscle-skeletal complex movements exercised during a gait; population dynamics in closed ecosystems; business cycles and many others.

The predominant focus on periodic processes can be explained by the availability of powerful tools exploiting harmonic analysis (Fourier/Laplace/Cosine transforms, wavelets etc).

These tools are however intrinsically limited by the fact that they rely on the structure of the timeline as a homogenous space.

Without this structure, the natural bases of harmonic analysis - consisting of characters of the group of shifts - just do not exist.
$\begin{array}{l} s_t:x\mapsto x+P\\ s_t^*e_\alpha=c_\alpha(t)e_\alpha \end{array}$

### Iterated Integrals

Consider multi-dimensional time series: the trajectories with values in a $d$-dimensional Euclidean space $V\cong\Real^d$, with coordinates $x_j, j=1,\ldots d$.

Before attacking the specific problem of recovering the cyclic nature of the observed processes, we will address the general question, of reparametrization invariant functionals of trajectories (that is mappings $\curve:I=[0,1]\to V$):

what are the descriptors, - functions of trajectories, - that do not depend on how one traverses them?

what are the reparametrization-invariant functionals of trajectories?

This question is classical, and has an answer in terms of the iterated path integrals, a notion going back to Picard; reintroduced into the modern mathematical practice by K.-T. Chen (while he was working at the University of Illinois at Urbana-Champaign).

K.-T. Chen we using them to understand some topological features of maps between manifolds; our applications do ot need any topological context, however.

Iterated integrals are functions of the trajectories defined recursively: integrals of higher orders via the integrals of lower order. They form a filtered space of functions on the time interval where the trajectories are defined.

The integrals of an order $0$ are the constants.

The integrals of order $k$ are given by $a(\curve)(t)=\int_0^t \sum_l a_l(s)d\curve_l(s),$ where all $a_l$'s are iterated integrals of order $\lt k$.

Equivalently, the space of iterated path integrals (for smooth trajectories) is spanned by $a(\curve):=\int\cdots\int_{0\lt t_1\lt\ldots\lt t_k\lt 1} \dot{\curve}_{l_{1}}(t_1)\cdots \dot{\curve}_{l_k}(t_k)dt_1\ldots dt_k.$

In particular, the integrals of order one are increments over the interval of observation, and we'll spend some time dealing with the second order integrals...

Iterated integrals are reparametrization invariant.

They are also unaffected by detours. (We say that a path $\curve$ has a detour if it can be represented as $\curve_s\curve_d\curve_d^{-1}\curve_f,$ a piece of trajectory traveled back and forth.)

A theorem by Chen implies an especially attractive feature of the iterated integrals - up to detours and reparametrizations, they form a complete algebra of functions on trajectories:

Two trajectories (without detours) have same iterated integrals of all orders, if and only if the trajectories coincide up to a reparametrization and shift in $V$.

The dimensions of the space of iterated integrals of a given order grow pretty fast, and their structure is related to some classical algebra and combinatorics (Hall bases, Baker-Campbell-Hausdorff-Dynkin formula,...), - we won't go there.

To reiterate:

Two trajectories $\curve,\curve':[0,T]\to V$ (without detours) have the same iterated integrals of all orders, if and only if the trajectories coincide up to a reparametrization and shift.

This suggests that any exploratory data analysis of parametrization-independent features of time series can rely on these functionals: they form a rich enough tool.
Just create a zillion of iterated integrals, and stare at them.

Remark: Over the past few years, the idea to use iterated integrals as a tool in data analysis has been promoted by T. Lyons and his students under the name of path signatures.

### Algebraic Areas

So now we know that the iterated integrals contain all the information about the trajectory.
But it is always good to have an interpretation for the readings of your sensors.

The first order iterated integrals just measure the total increments.

The second order integrals also have some clear interpretation.

Namely, the second order iterated integrals (mod first order integrals) are the algebraic areas encircled by the projections of the trajectory to a 2D plane.

The space of iterated integral of the second order is spanned by $A_{kl}:=\frac{1}{2}\int_{t} \curve_l d \curve_k - \curve_k d \curve_l,$ the oriented areas of the two-dimensional projections of the trajectory $\curve$ on the coordinate planes (with starting and ending points connected by a straight segment).

So what these oriented areas mean practically?

Here is our key heuristics: if one effect precedes another, the oriented area on the parametric joint plot of the corresponding functions will surround a positive algebraic area.

If two waves are following one another, the oriented area encircled by their parametric plot is positive or negative, depending on their temporal order.
Of course, the semantics of this assumes that $k$ and $l$ are consubstantial; if their nature is antagonistic, the lead-follow relation flips.

We take this idea as our primitive on which we will build.

Collectively, the iterated integrals of order 2, - oriented areas of the projections of the trajectory to the 2D coordinate planes - capture is the pairwise leader-follower relationship.

We arrange the iterated integrals $A_{kl}:=\frac{1}{2}\int_{t} \curve_l d \curve_k - \curve_k d \curve_l$ into a skew-symmetric matrix $$A=(A_{kl})_{1\leq k,l\leq d}$$ called the lead matrix.

The interpretation of the lead matrix coefficients is that an element $A_{kl}$ is positive if $l$ follows $k$, negative if $l$ leads $k$.

### Chain of Offsets Model

Specializing, we concentrate on a class of phenomena described by the chain of offsets model.

Under this model, the observed variables are (perhaps noisy and perhaps rescaled shifts) the same function on the 1D time manifold, - either the real line or the circle: $\curve_l(\tau)=c_l \phi(\tau - \alpha_l), l=1,\ldots,d.$ Here $\phi$ is the archetypal behavior, $c_l\gt 0$ some scaling parameters, and $\alpha_l$ are some phases.

Informally, under the chain of offsets model, the observables are retracing the same behavior, with some delays.

Even though we assume homogeneity of the domain for the observed functions in our model, we rely on explicitly reparametrization invariant methods in deriveing any inferences.

The central question we will address now is the recovery the order of the functions $\curve_l, l=1,\ldots,d$, (or, equivalently, recovery of the offsets $\alpha_l, l=1,\ldots,d$) from the trajectory. This is done using spectral analysis of the sampled lead matrix.

For some baseline situations the structure of the lead matrix is easy to describe.

Expand the primary function $\phi$ into the Fourier series, $$\label{eq:phi} \phi(\tau)=\sum_k c_k \exp(2\pi i k \tau).$$

Then it is immediate that the lead matrix over a period $A$ is given by $$A=\left(\psi(\alpha_k-\alpha_l)\right)_{kl}$$ where the lead kernel $\psi$ is given by $\psi(t)=2\pi\sum_m m|c_m|^2\sin(mt).$

Equivalently, the entries of the lead matrix are (proportional to) the derivative of the symmetric convolution of $\phi$: $\psi\sim \frac{d}{dt}\left(\int \phi(\tau)\phi(\tau-t)d \tau\right).$

The same relation holds for the functions on the real line.

### Lead Matrices and Their Spectra

Often the lead kernel is dominated by a single harmonic, - i.e. $m|c_m|^2$ is much larger than other terms (this is the case, e.g when $\phi$ itself is just a single harmonic).

Then the skew symmetric matrix $A$ would have rank two, spanned by $A^\pm$, where $$A^+_{kl}=c_kc_l \sin(\alpha_k)\cos(\alpha_l)$$ and $$A^-_{kl}=c_kc_l \cos(\alpha_k)\sin(\alpha_l).$$

As these vectors reflect the intrinsic order of the signals (say, a wave propagating through the medium, measured by an array of sensors), the spectral decomposition would reconstruct it.

A skew-symmetric matrix of rank 2 has the form $u\otimes v^\dagger-v\otimes u^\dagger$ and its nontrivial eigenvectors are in the space spanned by $u$ and $v$.

For example, in the case when $u,v$ are orthogonal the eigenvectors are $(1\mp i)|v|u+(1\pm i)|u|v.$

Hence, the corresponding (complex conjugated) eigenvectors of this rank 2 matrix are given by linear combinations of $v_1=e^{i\alpha}\left(e^{2\pi i\alpha_1},e^{2\pi i\alpha_2},\ldots,e^{2\pi i\alpha_d}\right), v_2=\overline{v}_1$ (for some phase $\alpha$).

In other words, one can reconstruct the cyclic order of the components.

Spectral decomposition of $A$ and the arguments of the components of the first eigenvector would then result in the desired cyclic order.

The spectrum of the sampled lead matrix by itself can serve as an indicator of the explanatory power of the cyclicity algorithm. In general, the spectrum of the skew-symmetric lead matrix consists of purely imaginary pairs $\pm i\lambda_j$, and zeros.

We assume that the absolute values of the spectral pairs are ordered, $$\lambda_1=\lambda_2\geq \lambda_3\ldots$$ The ratio $\lambda_1/\lambda_3$ indicates how well the lead matrix can be described by the noiseless harmonic model (for which the ratio is $+\infty$).

The situation becomes even more interesting for the chain of offsets model on the real line.

Consider a chain of offsets forming an arithmetic progression for the sinal that is a solitary pulse.
In this case, the lead matrix is a skew symmetrix Toeplitz matrix.

Approximating it by the convolution with the lead kernel shows that its leading eigenvalue will correspond to the maximum of $s|\psi(s)|^2$, the Fourier transform of the lead kernel, and the corresponding eigenfunction is the harmonic at that frequency.

The components of the eigenvector corresponging to the leading eigenvalue wind around the origin, as expected.

Chaining them in the visually obvious fashion provides a perfect reconstruction of the offset order:

If the lead matrix is noisy, the recovery of the natural order of the offsets becomes much more problematic:

Can one reconstruct the order of the offset from the data?

We can use the slice and splice technique familiar from DNA sequencing or (in TDA) in Mapper algorithm.

Partition the components of the eigenvector by overlapping sectors. Within each sector, cluster the components using the absolute values of the lead matrix as the proximity measures: the higher the entry, the closer the components.

Integrating the resulting partial orders gives the overall ordering.

Integrating the resulting partial orders gives the overall ordering.

### Examples

We combined the ideas outlined above into a pipe, resulting in a software tool for recovering the cyclic structure among the observed time series, under assumption of the chain of offsets model.

The workflow consists of

• Normalizing the data (and partitioning, if highly correlated noise is suspected);
• Forming the sampled lead matrix;
• Spectral analysis of the lead matrix (dominance of the leading eigenvalue an indicator of the validity of the chain of offsets model);
• Recovery of the ordering of the traces from the aggregated partial orders of the phases of the leading eigenvector components.

Some examples follow.

### Signal Propagation in Network

These results were obtained by Ivan Abraham (UIUC).

We consider a simple model (first passage percolation, gossip, ...) of signal propagation in a network.

A node broadcasts a sequence of pulse-like signals to its neighbors who pass it on.

Can we reconstruct the distances from the seed node by computing the iterated integrals?

We ran the standard pipe: eigenvalue decomposition, selection of the eigenvectors corresponding to leading eigenvalue.

One can see the intermediate results below:

The output of the reordering is shown below:

One interesting way to look at the data is to track the accumulation of the oriented areas in time:

The US economy goes through highly aperiodic cycles of business activity. At the aggregate level, the key indicator is the GDP growth, but many correlates of the business activity exist, most notably, the unemployment data, bonds returns etc.

The data are notoriously noisy.

Investors long sought to exploit the return dependences of the stock returns on the stages of the business cycle.

We considered the sectorial indices (as reported by Standard and Poor) for the daily closings, for the years 2003-2012.

These are the aggregates corresponding to the subsets of stock values tracked by S&P as a part of their industrial portfolio.

It is "well known" that these industrial sectors exhibit some kind of cyclic behavior (some showing better returns at the start of a business cycle; others - towards its end).

Here are the standard definition of the industries:

 BM Materials CY Consumer Cyclical EN Energy FN Financial HC Health Care IN Industrials NC Consumer Staples (Non-Cyclical) TC Technology TL Telecommunications UT Utilities

Assuming the chain of offset model, we analyzed 10 time series of the industrial sectors. The data were downloaded from finance.google.com, $\log$-ed, detrended and normalized. The results are shown below.

The resulting (cyclic) order, shown below, is in a close agreement with the financial "folklore knowledge":

Cyclicity Fidelity
FNFN
IN CY
CY IN
UT TC
EN BM
NC NC
BM EN
HC HC
TL TL
TC UT

### Example: Global Brain Waves

Most of this research was done by Ivan Abraham (UIUC).

Around 1400 fMRI traces (~15min each) from about 400 healthy individuals were analyzed. 69 regions of interest were singled out, and run through the cyclicity pipe.

A bit of context:

It is well known that slow intrinsic activity, as measured by resting-state fMRI in a variety of animals including humans, is organized into temporally synchronous networks. The question of whether intrinsic activity contains reproducible temporal sequences has received far less attention.
(from: Lag threads organize the brain’s intrinsic activity A Mitra, A Snydera, T Blazeya, ME Raichle, PNAS, 2015)

We selected the highest signal ROIs (based on an ellipse-fitting heuristics, highlighting the most pronounced ROIs in the COOM.
For those ROIs we derived the cyclic order and dis robustness testing and alignment testing for the imputed cycles. Here are the outlines of the results:

There are clear leaders and followers groups.

Once we know the leading and the lagging ROIs, we can investigate the leader-follower pairs to understand better the relationship between them.
Iterated integrals allow one to resolve the basic questions about the passing of the excitations through the regions.

Most of the iteracted integral (oriented area) is accumulated in irregularly spaced bursts. A typical burst is characterized y a pair of leader-follower peaks.
The neurophysiological meaning of these waves remains to be understood.

Again, accumulation of the oriented areas is strongly suggestive of the mental activity (the plot below shows the patterns of oriented areas during a sequence of social tasks during the measurement period):

### Conclusions

• We focused on a specific class of iterated integrals having advantage of being readily interpretable in geometric terms.
• In a variety of examples, spectral analysis of lead matrices seem to suggest an intrinsic leader-follower relationships between components of a multidimensional time series.
• Especially when coupled with the Chain Of Offsets Model, the computational pipe presented here is a novel, highly descriptive power tool in data analysis.