The mathematics of Google

Minicourse for AGORA days

Course Description

How does Google sort through the roughly 5,000,000,000 webpages to find relevant sites? It turns out that the answer involves the mathematics of linear algebra, which, roughly speaking, involves solving systems of linear equations. Such systems are ubiquitous in science and engineering. In this whirlwind class, we'll cover about 2/3 of the syllabus of a class usually taken by junior engineering majors, learning about Gaussian elimination, linear transformations and change of basis, eigenvalues and eigenvectors, culminating with the page rank algorithm developed by Page and Brin as grad students at Stanford.

  • Lecture 1 Introduction to systems of linear equations: this is nothing more than solving a system like 2x+3y=6, x-2y=2 (run amok). The algorithm which produces solutions is known as Gaussian elimination. We'll also study the basics of matrix multiplication, and look at a simple example of a Markov process: suppose in Smallville, every year 30% of nonsmokers begin smoking, and 20% of smokers quit. If initially there are 8000 nonsmokers and 2000 smokers, what are the numbers at the end of 100 years? There is a beautiful and elegant solution to this problem, which we begin to tackle in the next class.

  • Lecture 2 The problem introduced in the previous class can be solved by successive matrix multiplications A*v(i)= v(i+1), where v(i) is the vector representing the # of nonsmokers and smokers at time i (hence v(0) = (8000,2000)), and A is the transition matrix
    .7 .2
    .3 .8
    so that A*v(0) = v(1) = (6000,4000). Now iterate this process. Of course, with large matrices this is very costly. The solution is to find a better way to represent the matrix A. This leads us to the notion of linear transformation and change of coordinates; a change of coordinates allows us to replace A with a simpler matrix, without losing information.

  • Lecture 3 With the notion of change of coordinates and linear transformation under our belt, we introduce the notions of eigenvalue and eigenvector. Formally, an eigenvector for a matrix A is a vector v such that Av=cv, for some constant c. Despite the fancy German names, these are very concrete objects and there is a simple algorithm to compute them. In particular, eigenvectors give us the desired simple change of coordinates that allows us to solve systems analogous to the one above easily and by hand! This will be the key to the page rank algorithm.

  • Lecture 4 Today is the capstone of the course, where we use the tools developed to understand the page rank algorithm. The key idea is to rank a page by the number of pages which point to it, so essentially we are representing the web as a directed graph (a set of nodes and edges between nodes, with edges having direction). There are some problems to overcome, because someone surfing the web may not follow a link from the current page, but instead jump to another page. The solution to this problem is simple and elegant, and involves perturbing the transition matrix A (which encodes the directed graph modelling the web) by a matrix representing the possibility the user makes a jump.

    Other information: While the class really requires no mathematical prerequisites, you should bring an enjoyment of mathematics and willingness to learn.


    Dr. Hal Schenck
    Mathematics Department, University of Illinois