Consider some function of time, \(\mathbf{x}:T\to\mathbb{R}^d\), describing a certain phenomenon. What this function tells us, if we are very uncertain about the time parametrization?
In other words, we are forced to assume that our time
readings are a corrupted, reparameterized version of the
actual clock, perhaps even with backtracking:
\[
\mathbf{x}(t)=\mathbf{X}(\tau(t)), \tau:\mathbb{R}\to\mathbb{R}.
\]
Under such assumptions, what can we learn about the phenomenon?
The problem of finding reparametrization invariant descriptors of time series emerged from a very practical observation:
All periodic phenomena are cyclic, but not vice versa.
The distinction is important. There are plenty of processes that are manifestly cyclic, but not periodic, in the sense that no period \(P\) can be found, such that the shift of the process along the time axis by $P$ leaves the process, or its parameters, invariant.
Examples of cyclic yet not periodic processes include cardiac cycle; muscleskeletal complex movements exercised during a gait; population dynamics in closed ecosystems; business cycles and many others.
The predominant focus on periodic processes can be explained by the availability of powerful tools exploiting harmonic analysis (Fourier/Laplace/Cosine transforms, wavelets etc).
These tools are however intrinsically limited by the fact that they rely on the structure of the timeline as a homogenous space.
Without this structure, the
natural bases of harmonic analysis  consisting of
characters of the group of shifts  just do not exist.
\[
\begin{array}{l}
s_t:x\mapsto x+P\\
s_t^*e_\alpha=c_\alpha(t)e_\alpha
\end{array}
\]
Consider multidimensional time series: the trajectories with values in a $d$dimensional Euclidean space $V\cong\Real^d$, with coordinates $x_j, j=1,\ldots d$.
Before attacking the specific problem of recovering
the cyclic nature of the observed processes, we will
address the general question, of reparametrization
invariant functionals of trajectories (that is
mappings $\curve:I=[0,1]\to V$):
what are the descriptors,  functions of trajectories,  that do not depend on how one traverses them?
what are the reparametrizationinvariant functionals of trajectories?
This question is classical, and has an answer in terms of the iterated path integrals, a notion going back to Picard; reintroduced into the modern mathematical practice by K.T. Chen (while he was working at the University of Illinois at UrbanaChampaign).
K.T. Chen we using them to understand some topological features of maps between manifolds; our applications do ot need any topological context, however.
Iterated integrals are functions of the trajectories defined recursively: integrals of higher orders via the integrals of lower order. They form a filtered space of functions on the time interval where the trajectories are defined.
The integrals of an order $0$ are the constants.
The integrals of order $k$ are given by \[ a(\curve)(t)=\int_0^t \sum_l a_l(s)d\curve_l(s), \] where all $a_l$'s are iterated integrals of order $\lt k$.
Equivalently, the space of iterated path integrals (for smooth trajectories) is spanned by \[ a(\curve):=\int\cdots\int_{0\lt t_1\lt\ldots\lt t_k\lt 1} \dot{\curve}_{l_{1}}(t_1)\cdots \dot{\curve}_{l_k}(t_k)dt_1\ldots dt_k. \]
In particular, the integrals of order one are increments over the interval of observation, and we'll spend some time dealing with the second order integrals...
Iterated integrals are reparametrization invariant.
They are also unaffected by detours. (We say that a path $\curve$ has a detour if it can be represented as \[ \curve_s\curve_d\curve_d^{1}\curve_f, \] a piece of trajectory traveled back and forth.)
A theorem by Chen implies an especially attractive feature of the iterated integrals  up to detours and reparametrizations, they form a complete algebra of functions on trajectories:
Two trajectories (without detours) have same iterated integrals of all orders, if and only if the trajectories coincide up to a reparametrization and shift in \(V\).
The dimensions of the space of iterated integrals of a given order grow pretty fast, and their structure is related to some classical algebra and combinatorics (Hall bases, BakerCampbellHausdorffDynkin formula,...),  we won't go there.
To reiterate:
Two trajectories \[ \curve,\curve':[0,T]\to V \] (without detours) have the same iterated integrals of all orders, if and only if the trajectories coincide up to a reparametrization and shift.
This suggests that any exploratory data analysis
of parametrizationindependent features of time
series can rely on
these functionals: they form a rich enough tool.
Just create a zillion of iterated integrals, and stare at them.
Remark: Over the past few years, the idea to use iterated integrals as a tool in data analysis has been promoted by T. Lyons and his students under the name of path signatures.
So now we know that the iterated integrals contain all the information about the trajectory.
But it is always good to have an interpretation for the readings of your sensors.
The first order iterated integrals just measure the total increments.
The second order integrals also have some clear interpretation.
Namely, the second order iterated integrals (mod first order integrals) are the algebraic areas encircled by the projections of the trajectory to a 2D plane.
The space of iterated integral of the second order is spanned by \[ A_{kl}:=\frac{1}{2}\int_{t} \curve_l d \curve_k  \curve_k d \curve_l, \] the oriented areas of the twodimensional projections of the trajectory $\curve$ on the coordinate planes (with starting and ending points connected by a straight segment).
So what these oriented areas mean practically?
Here is our key heuristics: if one effect precedes another, the oriented area on the parametric joint plot of the corresponding functions will surround a positive algebraic area.
If two waves are following one another, the oriented area encircled by their parametric plot is positive or negative, depending on their temporal order.
Of course, the semantics of this assumes that $k$ and $l$
are consubstantial; if their nature is antagonistic, the
leadfollow relation flips.
We take this idea as our primitive on which we will build.
Collectively, the iterated integrals of order 2,  oriented areas of the projections of the trajectory to the 2D coordinate planes  capture is the pairwise leaderfollower relationship.
We arrange the iterated integrals \[ A_{kl}:=\frac{1}{2}\int_{t} \curve_l d \curve_k  \curve_k d \curve_l \] into a skewsymmetric matrix $$ A=(A_{kl})_{1\leq k,l\leq d} $$ called the lead matrix.
The interpretation of the lead matrix coefficients is that an element $A_{kl}$ is positive if $l$ follows $k$, negative if $l$ leads $k$.
Specializing, we concentrate on a class of phenomena described by the chain of offsets model.
Under this model, the observed variables are (perhaps noisy and perhaps rescaled shifts) the same function on the 1D time manifold,  either the real line or the circle: \[ \curve_l(\tau)=c_l \phi(\tau  \alpha_l), l=1,\ldots,d. \] Here $\phi$ is the archetypal behavior, $c_l\gt 0$ some scaling parameters, and $\alpha_l$ are some phases.
Informally, under the chain of offsets model, the observables are retracing the same behavior, with some delays.
Even though we assume homogeneity of the domain for the observed functions in our model, we rely on explicitly reparametrization invariant methods in deriveing any inferences.
The central question we will address now is the recovery the order of the functions $\curve_l, l=1,\ldots,d$, (or, equivalently, recovery of the offsets $\alpha_l, l=1,\ldots,d$) from the trajectory. This is done using spectral analysis of the sampled lead matrix.
For some baseline situations the structure of the lead matrix is easy to describe.
Let us start with the underlying space being the circle.
Expand the primary function $\phi$ into the Fourier series, \begin{equation} \label{eq:phi} \phi(\tau)=\sum_k c_k \exp(2\pi i k \tau). \end{equation}
Then it is immediate that the lead matrix over a period $A$ is given by $$ A=\left(\psi(\alpha_k\alpha_l)\right)_{kl} $$ where the lead kernel $\psi$ is given by \[ \psi(t)=2\pi\sum_m mc_m^2\sin(mt). \]
Equivalently, the entries of the lead matrix are (proportional to) the derivative of the symmetric convolution of $\phi$: \[ \psi\sim \frac{d}{dt}\left(\int \phi(\tau)\phi(\taut)d \tau\right). \]
The same relation holds for the functions on the real line.
Often the lead kernel is dominated by a single harmonic,  i.e. $mc_m^2$ is much larger than other terms (this is the case, e.g when $\phi$ itself is just a single harmonic).
Then the skew symmetric matrix $A$ would have rank two, spanned by $A^\pm$, where $$ A^+_{kl}=c_kc_l \sin(\alpha_k)\cos(\alpha_l) $$ and $$ A^_{kl}=c_kc_l \cos(\alpha_k)\sin(\alpha_l). $$
As these vectors reflect the intrinsic order of the signals (say, a wave propagating through the medium, measured by an array of sensors), the spectral decomposition would reconstruct it.
A skewsymmetric matrix of rank 2 has the form $u\otimes v^\daggerv\otimes u^\dagger$ and its nontrivial eigenvectors are in the space spanned by $u$ and $v$.
For example, in the case when \(u,v\) are orthogonal the eigenvectors are \[ (1\mp i)vu+(1\pm i)uv. \]
Hence, the corresponding (complex conjugated) eigenvectors of this rank 2 matrix are given by linear combinations of \[ v_1=e^{i\alpha}\left(e^{2\pi i\alpha_1},e^{2\pi i\alpha_2},\ldots,e^{2\pi i\alpha_d}\right), v_2=\overline{v}_1 \] (for some phase $\alpha$).
In other words, one can reconstruct the cyclic order of the components.
Spectral decomposition of $A$ and the arguments of the components of the first eigenvector would then result in the desired cyclic order.
The spectrum of the sampled lead matrix by itself can serve as an indicator of the explanatory power of the cyclicity algorithm. In general, the spectrum of the skewsymmetric lead matrix consists of purely imaginary pairs $\pm i\lambda_j$, and zeros.
We assume that the absolute values of the spectral pairs are ordered, $$ \lambda_1=\lambda_2\geq \lambda_3\ldots $$ The ratio $\lambda_1/\lambda_3$ indicates how well the lead matrix can be described by the noiseless harmonic model (for which the ratio is $+\infty$).
The situation becomes even more interesting for the chain of offsets model on the real line.
Consider a chain of offsets forming an arithmetic progression for the sinal that is a solitary pulse.
In this case, the lead matrix is a skew symmetrix Toeplitz matrix.
Approximating it by the convolution with the lead kernel shows that its leading eigenvalue will correspond to the maximum of $s\psi(s)^2$, the Fourier transform of the lead kernel, and the corresponding eigenfunction is the harmonic at that frequency.
The components of the eigenvector corresponging to the leading eigenvalue wind around the origin, as expected.
Chaining them in the visually obvious fashion provides a perfect reconstruction of the offset order:
If the lead matrix is noisy, the recovery of the natural order of the offsets becomes much more problematic:
Can one reconstruct the order of the offset from the data?
We can use the slice and splice technique familiar from DNA sequencing or (in TDA) in Mapper algorithm.
Partition the components of the eigenvector by overlapping sectors. Within each sector, cluster the components using the absolute values of the lead matrix as the proximity measures: the higher the entry, the closer the components.
Integrating the resulting partial orders gives the overall ordering.
Integrating the resulting partial orders gives the overall ordering.
We combined the ideas outlined above into a pipe, resulting in a software tool for recovering the cyclic structure among the observed time series, under assumption of the chain of offsets model.
The workflow consists of
Some examples follow.
These results were obtained by Ivan Abraham (UIUC).
We consider a simple model (first passage percolation, gossip, ...) of signal propagation in a network.
A node broadcasts a sequence of pulselike signals to its neighbors who pass it on.
Can we reconstruct the distances from the seed node by computing the iterated integrals?
We ran the standard pipe: eigenvalue decomposition, selection of the eigenvectors corresponding to leading eigenvalue.
One can see the intermediate results below:
The output of the reordering is shown below:
One interesting way to look at the data is to track the accumulation of the oriented areas in time:
The US economy goes through highly aperiodic cycles of business activity. At the aggregate level, the key indicator is the GDP growth, but many correlates of the business activity exist, most notably, the unemployment data, bonds returns etc.
The data are notoriously noisy.
Investors long sought to exploit the return dependences of the stock returns on the stages of the business cycle.
We considered the sectorial indices (as reported by Standard and Poor) for the daily closings, for the years 20032012.
These are the aggregates corresponding to the subsets of stock values tracked by S&P as a part of their industrial portfolio.
It is "well known" that these industrial sectors exhibit some kind of cyclic behavior (some showing better returns at the start of a business cycle; others  towards its end).
Here are the standard definition of the industries:


Assuming the chain of offset model, we analyzed
10 time series of the industrial sectors. The data were downloaded from
The resulting (cyclic) order, shown below, is in a close agreement with the financial "folklore knowledge":
Cyclicity  Fidelity 

FN  FN 
IN  CY 
CY  IN 
UT  TC 
EN  BM 
NC  NC 
BM  EN 
HC  HC 
TL  TL 
TC  UT 
Most of this research was done by Ivan Abraham (UIUC).
Around 1400 fMRI traces (~15min each) from about 400 healthy individuals were analyzed. 69 regions of interest were singled out, and run through the cyclicity pipe.
A bit of context:
It is well known that slow intrinsic activity, as measured by restingstate fMRI in a variety of animals including humans, is organized into temporally synchronous networks. The question of whether intrinsic activity contains reproducible temporal sequences has received far less attention.(from: Lag threads organize the brainâ€™s intrinsic activity A Mitra, A Snydera, T Blazeya, ME Raichle, PNAS, 2015)
We selected the highest signal ROIs (based on an
ellipsefitting heuristics, highlighting the most
pronounced ROIs in the COOM.
For those ROIs we derived the cyclic order and dis
robustness testing and alignment testing for the imputed
cycles. Here are the outlines of the results:
There are clear leaders and followers groups.
Once we know the leading and the lagging ROIs, we can
investigate the leaderfollower pairs to understand
better the relationship between them.
Iterated integrals allow one to resolve the basic
questions about the passing of the excitations through
the regions.
Most of the iteracted integral (oriented area) is
accumulated in irregularly spaced bursts. A
typical burst is characterized y a pair of
leaderfollower peaks.
The neurophysiological meaning of
these waves remains to be understood.
Again, accumulation of the oriented areas is strongly suggestive of the mental activity (the plot below shows the patterns of oriented areas during a sequence of social tasks during the measurement period):