$\def\conf{\mathtt{conf}} \def\cov{\mathtt{cov}} \def\Int{\mathbb{Z}} \def\Real{\Bbb{R}} \def\Comp{\mathbb{C}} \def\ZN{\Int_N} \def\obst{{\mathcal{O}}} \def\dip{{\mathtt{P}}\vec{\mathtt{ath}}} \def\trans{\mathtt{Trans}} \def\U{{\mathcal U}} \def\Ed{{\mathcal E}} \def\Rd{{\Real}^d} \def\cN{{\mathbf N}} \def\OO{{\mathcal O}} \def\DD{{\mathcal D}} \def\HH{{\mathcal H}} \def\NN{{\mathcal N}} \def\VV{{\mathcal V}} \def\domain{{\mathcal{B}}} \def\ii{{\mathbf i}} \def\T{{\mathbb T}} \def\S{{\mathbb S}} \def\mod{{\mathit mod}} \def\Cycle{{\mathbf \sigma}} \def\prob{{\mathbb P}} \def\ex{{\mathbb E}} \def\codim{{\mathit{codim}}} \def\paths{{\mathcal P}} \def\sens{{\mathcal S}} \def\measurements{{\mathcal E}} \def\indep{{\mathcal I}} \def\one{\mathbb{I}} \def\dim{\mathtt{dim}} \def\nset{\mathbf{n}} \def\mset{\mathbf{m}} \def\Nat{\mathbb{N}} \def\types{\mathfrak{F}} \def\ideal{\mathcal{I}} \def\bz{\bar{z}} \def\bn{\bar{n}} \def\arrange{\mathcal{A}} \def\fld{\Bbbk} \def\attr{\mathfrak{A}} \def\circle{\mathbf{C}} \def\bx{\mathbf{x}} \def\br{\mathbf{r}} \def\tr{\mathtt{tr}} \def\S{\mathbb{S}} \def\ex{\mathbb{E}} \def\diag{\mathcal{D}} \def\noise{{\boldsymbol{\epsilon}}} \def\ex{\mathbb{E}} \def\var{\mathbb{V}} \def\coom{{\sc COOM}} \def\icl{\mathbf{T}} \def\iclm{{\mathcal{T}}} \def\itint{I}%\mathbf{I}} \def\s1{\mathbb{S}^1} \def\curve{\boldsymbol{\gamma}} \def\brm{\mathbb{B}} \def\mtree{\mathbf{T}} $

Energy/Intelligence Tradeoffs

yuliy baryshnikov

University of Illinois at Urbana-Champaign
mathematics and ECE

Intelligence vs Power Tradeoffs

This work started with Dan Koditschek asking: what are the tradeoffs between our ability to process information (Intelligence) and our expenditure of raw power to achieve our goals

A useful metaphor is the door handle:

  • One can either deploy complicated hand/arm motion to turn the handle (or operate the key), or
  • One can just break through the door.

We are still in the stage of looking for correct models. This talk will present two of them, both styllized, both presenting some version of the underlying idea.

First is more abstract, but richer; the other model is perhaps more realistic, but somewhat reductionist.

Model 1: Percolation

Percolation is a classical class of models going back to Ulam and Hammersley. We will be addressing the so-called first passage percolation.

In those models, one picks a path between far away points in the configuraion space (typically, \(\Nat^d\subset \Real^d\)) minimizing the weight of the path, \[ w(\gamma)=\sum_{p\in \gamma} w(p). \]

We restrict ourselves to a (simpler) class of directed percolation models, where the paths are moving only in specified directions. For simplicity, we will be dealing with \(\Real^2\), and all the path move S or E.

s In this case, the optimal cost of reaching a location starting at \((0,0)\) satisfies the standard (Bellman) recursion \[ E(k,l)=w(k,l)+\min (E(k-1,l),E(k,l-1)). \]

If the weights are independent, identically distributed (and regular enough, - say, bounded), then Kingman's subadditive ergodic theorem asserts that the weight of the optimal trajectory approximately behaves like a degree 1 homogeneous convex function of the endpoint.

Complexity of Optimal Trajectories

Among various components of Intelligence (ability to explore, to generalize, to learn,...) we concentrte on the simplest, and most basic one: memory.

Given some limitations on how much information can be stored, what is the weight of the optimal trajectory tht can be communicated to the agent executing it within the information constraints?

Clearly, as we restrict the class of trajectories available for comparison, the optimum increases (becomes worse). By how much?

The problem makes sense in the asymptotic regime: both the information (bit complexity) and the cost (energy/power) scale linealy with the size, making tradeoffs easily comparable.

As an ersatz complexity descriptior of a SE trajectory on the plane, we take its total number of turns, \[ T(\gamma)=\#(\mathtt{...SE...} \mbox{ or } \mathtt{...ES...}). \]

We will focus on square domains \([0,N]\times[0,N]\). The NE path from \((0,0)\) to \((N,N)\) takes at most \(2N\) turns.
In fact, a typical path takes about \(N\) turns: the number of trajectories with number of turns outside of the interval \(((1-\epsilon)N,(1+\epsilon)N)\) decreases exponentially with \(N\).

For the record, the number of bits (entropy) required to encode the trajectory with at most \(c\) turns is \[ \approx N|c\log(c/2)+(2-c)\log((1-c/2)|\ \mbox{ for } c<1; 2N\ \mbox{ for } c\geq 1. \]


Now we can investigate the dependence of the optimal cost, under the constraint that the number of turns of the NE-trajectory from \((0,0)\) to \((N,N)\) is at most \(cN\).

Here is a typical plot of the optimal cost as a function of the limit on the number of turns. Here the size of the domain is \(N=1000\), and the underlying weights are Bernoulli percolation, \(p=.5\).

Here's a similar percolation plot with the same \(N=1000\), but the cost density \(p=.3\) is below percolation threshold (so that with unlimited number of turns, one can essentially avoid all obstacles).

Again, Kingman theorem implies the following result:

For any \(x,y,c>0\), and an iid percolation weights (with, say, analytic Laplace transform at \(0\)), the optimal cost of reaching site\((xN,yN)\) with at most \(c\) turns scaled by \(N\) converges a.s. to a convex homogenous degree \(1\) function \(E(x,y,c)\).

Model 2: Hysteresis and Rational(izable) Inattention

Our second model was motivated by Jim, Noah et al paper
The proposed model is captured by the following flowchart:

Specifically, we assume that the discomfort associated with a mode of action \(q\), and the state \(k\) is given by function \(U=U_q(k)\).

Given the level of discomfort \(U_q(k)\), I posit that the chances to keep going in the current modus is \[ \exp(-\alpha U). \] (Here \(1/\alpha\) is the "pain tolerance" parameter.) In other words, the more difficult the environment seems, the more inclined one is to explore.

If the exploration is on, the action is selected probabilistically, with logistic (a.k.a. hyperbolic tangent) switching probability: \[ k(q)=\frac{\exp(-\beta U_q(k))}{\sum_p \exp(-\beta U_p(k))}. \] This is a standard S-shaped acceptance function, - other S-shaped functions possible, but I expect the results being pretty robust with respect to variations.


Here are the results of the simulations (here \(N=20; \alpha=1.2/N; \beta=12/N\)):
To make sure the discomfort that triggers the exploration phase is critical for the hysteresis, here are the results for \(\alpha=.2/N; \beta=12/N\) (low \(\alpha\) implies that the exploration phase is almost certain):


  • Tradeoffs between the expenditure of mindless raw power and intelligent behavior are manifest in simulations, and, perhaps, in real life.
  • This implies that the intelligence is costly, and is not just a hard constraint.
  • Should be a fixture in any modeling efforts.