# Numerical Linear Algebra (Part 1)¶

## NumPy arrays versus SciPy sparse matrices¶

In [2]:
from numpy import identity, savetxt
from scipy.io import savemat

n = 1000
denseI = identity(n, dtype='float64')
savetxt('tmp/denseI.txt', denseI)
dense_dict = {'denseI': denseI}
savemat('tmp/denseI.mat', dense_dict)

In [3]:
!ls -l tmp/denseI*

-rw-r--r--  1 hirani  staff   8000192 Jun 13 09:36 tmp/denseI.mat
-rw-r--r--  1 hirani  staff  25000000 Jun 13 09:36 tmp/denseI.txt

In [4]:
from scipy.sparse import identity

sparseI = identity(n, dtype='float64', format='csr')
sparse_dict = {'sparseI' : sparseI}
savemat('tmp/sparseI.mat', sparse_dict)

In [5]:
!ls -l tmp/sparseI*

-rw-r--r--  1 hirani  staff  16216 Jun 13 09:36 tmp/sparseI.mat


## Some SciPy sparse formats¶

In [6]:
from numpy import array
from scipy.sparse import coo_matrix

rows = array([0, 0, 1, 2, 2, 2])
cols = array([0, 2, 2, 0, 1, 2])
data = array([1, 2, 3, 4, 5, 6])
C = coo_matrix((data, (rows, cols)), shape=(3,3))
C.todense() # converting to NumPy matrix to print contents easily

Out[6]:
matrix([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
In [7]:
C

Out[7]:
<3x3 sparse matrix of type '<type 'numpy.int64'>'
with 6 stored elements in COOrdinate format>
In [8]:
C.nnz

Out[8]:
6
In [9]:
C.T.todense()

Out[9]:
matrix([[1, 0, 4],
[0, 0, 5],
[2, 3, 6]])
In [10]:
from scipy.sparse import csr_matrix

indptr = array([0,2,3,6]) # row 0 indices start at index 0 of indices,
# row 1 indices start at index 2 of indices
# row 3 indices start at index 3 of indices
# Last item is length of data
indices = array([0,2,2,0,1,2]) # column numbers
data = array([1,2,3,4,5,6])
R = csr_matrix((data, indices, indptr), shape=(3,3))
R.todense()

Out[10]:
matrix([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
In [11]:
R.tocoo()

Out[11]:
<3x3 sparse matrix of type '<type 'numpy.int64'>'
with 6 stored elements in COOrdinate format>
In [12]:
R

Out[12]:
<3x3 sparse matrix of type '<type 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
In [13]:
C.tocsr()

Out[13]:
<3x3 sparse matrix of type '<type 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
In [14]:
C

Out[14]:
<3x3 sparse matrix of type '<type 'numpy.int64'>'
with 6 stored elements in COOrdinate format>
In [15]:
from scipy.sparse import lil_matrix

L = lil_matrix((3,3), dtype='float64')

In [16]:
L

Out[16]:
<3x3 sparse matrix of type '<type 'numpy.float64'>'
with 0 stored elements in LInked List format>
In [17]:
L.todense()

Out[17]:
matrix([[ 0.,  0.,  0.],
[ 0.,  0.,  0.],
[ 0.,  0.,  0.]])
In [18]:
L

Out[18]:
<3x3 sparse matrix of type '<type 'numpy.float64'>'
with 0 stored elements in LInked List format>

## Indexing and slicing not allowed for all formats¶

In [19]:
L[0,0] = 2; L[1,2] = 3

In [20]:
L.todense()

Out[20]:
matrix([[ 2.,  0.,  0.],
[ 0.,  0.,  3.],
[ 0.,  0.,  0.]])
In [21]:
L[1,1:3].todense()

Out[21]:
matrix([[ 0.,  3.]])
In [22]:
L[1,1:3]

Out[22]:
<1x2 sparse matrix of type '<type 'numpy.float64'>'
with 1 stored elements in LInked List format>
In [23]:
C[1,1:3]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-26033b4bc06a> in <module>()
----> 1 C[1,1:3]

TypeError: 'coo_matrix' object has no attribute '__getitem__'
In [24]:
R[1,1:3]

Out[24]:
<1x2 sparse matrix of type '<type 'numpy.int64'>'
with 1 stored elements in Compressed Sparse Row format>
In [25]:
C[0,0] = 2; C[1,2] = 3

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-25-00a7db0f6669> in <module>()
----> 1 C[0,0] = 2; C[1,2] = 3

TypeError: 'coo_matrix' object does not support item assignment
In [26]:
R[0,0] = 111; R[1,2] = 999
R.todense()

Out[26]:
matrix([[111,   0,   2],
[  0,   0, 999],
[  4,   5,   6]])

## Exercise 1¶

1. Write the CSR representation for the matrix: $$A= \left[ \begin{array}{ccc} 1 & 0 & 3 \\ 0 & 5 & 0 \\ 0 & 0 & 2 \end{array} \right]$$ and then use that to create a CSR matrix. Then create the same matrix by first creating the appropriate COO matrix.
2. Construct a dense 5 x 5 matrix of floats with 2 along the main diagonal, and -1 along the diagonal above and below the main diagonal. We will refer to this type of matrix as a discrete 1D Laplacian (we'll see the reason for the name in the differential equation unit). You can use numpy.eye or scipy.linalg.toeplitz. If you use the eye then you can build identity matrices using eye(n) and zero matrices with an off-diagonal of ones using eye(n, k=...|-1|0|1|...).

## LU and Cholesky decompositions¶

First let's see how to solve a simple linear system using scipy.linalg without worrying about the choices available amongst the solving techniques. We will cover that below. First suppose you have a small dense numpy array and just want to solve a linear system once.

In [2]:
from scipy.linalg import solve
import numpy as np

A = np.array(range(1,9) + [0], dtype=float).reshape((3,3))
print A
b = np.array([1., 0, 2])
x = solve(A, b)
print
print x

[[ 1.  2.  3.]
[ 4.  5.  6.]
[ 7.  8.  0.]]

[-2.          2.         -0.33333333]


You could try to get a sense of how well it did by computing $A\; x$ and seeing how close it is to $b$. But one of the non-obvious things for beginners is that this does not always present an accurate picture of the error. This is due to the idea of condition number of a matrices that we will look at later. But for now lets just compute $A\; x$ using the solution $x$ obtained above.

In [3]:
np.dot(A, x)

Out[3]:
array([ 1.,  0.,  2.])

Solving linear systems with the Gaussian elimination algorithm is typically implemented in two phases. The Gaussian elimination algoritm is used to produce a factorization of matrix $A$ $$A = P\, L\, U$$ where $P$ is a permutation of the identity matrix, $L$ is a lower triangular matrix with ones on the diagonal and $U$ is an upper triangular matrix. For an order $n$ matrix this step takes $O(n^3)$ time.

That's the expensive phase of the linear solve. The next phase involves solving a triangular system and that takes $O(n^2)$ time. If repeated solves using various right hand side $b$ are needed then factorizing first and then solving is recommended.

The second phase is implemented as follows. We want to solve $A\, x = b$. Since $A = P\,L\,U$ this is equivalent to solving $P\, L\, U\, x = b$. Let $U\, x = y$ for some unknown vector $y$. Then we want to solve $P\, L\, y = b$ or equivalently solve $L\, y = P^T\, b$ for $y$. Once you have $y$ you can solve for $x$ by solving $U\, x = y$.

Since the two solves are triangular solves each takes $O(n^2)$ time. Thus if solution is needed for many different RHS vectors $b$ then the above procedure is better than doing Gaussian elimination each time. Factorizing once with $O(n^3)$ and solving multiple times with $O(n^2)$ is better.

The scipy.linalg module has a function lu which takes an array and returns its LU decomposition. In addition, the module provides a function solve_triangular(T, b, lower=True|False) which efficiently solves triangular matrices. We can combine these to solve the linear system we solved abovebut this time using the procedure outlined above.

In [27]:
import numpy as np

In [5]:
from scipy.linalg import lu, solve_triangular

A = np.array(range(1,9) + [0], dtype=float).reshape((3,3))
b = np.array([1., 0, 2])
P, L, U = lu(A)
y = solve_triangular(L, np.dot(P.T, b), lower=True)
x = solve_triangular(U, y)
print x

[-2.          2.         -0.33333333]


Similarly there is a function in scipy.linalg for matrices with other commonly occuring structure. Many matrices are banded, in that there is a band of nonzero entries only in some banded neigborhood of the diagonal. Correspondingly there is solve_banded function for this. For Hermitian (or symmetric in the real case) matrices there are special functions. You should explore the scipy.linalg reference pages.

Lets take another matrix and examine the factors $P$, $L$, and $U$.

In [28]:
A = np.array([[2,1,1,0], [4,3,3,1], [8,7,9,5], [6,7,9,8]], dtype=float)
print A

[[ 2.  1.  1.  0.]
[ 4.  3.  3.  1.]
[ 8.  7.  9.  5.]
[ 6.  7.  9.  8.]]

In [29]:
from scipy.linalg import lu

P, L, U = lu(A)
print L

[[ 1.          0.          0.          0.        ]
[ 0.75        1.          0.          0.        ]
[ 0.5        -0.28571429  1.          0.        ]
[ 0.25       -0.42857143  0.33333333  1.        ]]

In [30]:
print U

[[ 8.          7.          9.          5.        ]
[ 0.          1.75        2.25        4.25      ]
[ 0.          0.         -0.85714286 -0.28571429]
[ 0.          0.          0.          0.66666667]]

In [31]:
print P

[[ 0.  0.  0.  1.]
[ 0.  0.  1.  0.]
[ 1.  0.  0.  0.]
[ 0.  1.  0.  0.]]


$P$ is a permutation of the identity matrix of order 4 and $P^T\, P = P\, P^T = I$.

In [44]:
np.dot(P.T, P)

Out[44]:
array([[ 1.,  0.,  0.,  0.],
[ 0.,  1.,  0.,  0.],
[ 0.,  0.,  1.,  0.],
[ 0.,  0.,  0.,  1.]])
In [32]:
print np.dot(L, U)

[[ 8.  7.  9.  5.]
[ 6.  7.  9.  8.]
[ 4.  3.  3.  1.]
[ 2.  1.  1.  0.]]

In [33]:
np.dot(P, np.dot(L, U))

Out[33]:
array([[ 2.,  1.,  1.,  0.],
[ 4.,  3.,  3.,  1.],
[ 8.,  7.,  9.,  5.],
[ 6.,  7.,  9.,  8.]])
In [34]:
print A

[[ 2.  1.  1.  0.]
[ 4.  3.  3.  1.]
[ 8.  7.  9.  5.]
[ 6.  7.  9.  8.]]


For stability, in general Gaussian elimination needs to be done with at least partial pivoting which involves the use of the largest entry in each column and swapping rows.

If $A$ is symmetric positive definite ($x^T\, A\, x \ge 0$ for all $x \ne 0$) then pivoting is not needed for stability. In this case, if $A$ is factorized as $LU$ with $L$ and $U$ having the same diagonal elements, then $U$ is just $L^T$ and you get the Cholesky factorization of $A$: $$A = L \, L^T\, .$$ Since pivoting is not needed, there is no permutation matrix produced in the case of Cholesky. Some people use Cholesky to discover if a symmetric matrix is positive definite.

In [35]:
from scipy.linalg import cholesky

A = np.array([[2.,-1,0], [-1,2,-1], [0,-1,2]])
print A
L = cholesky(A, lower=True)
np.dot(L, L.T)

[[ 2. -1.  0.]
[-1.  2. -1.]
[ 0. -1.  2.]]

Out[35]:
array([[ 2., -1.,  0.],
[-1.,  2., -1.],
[ 0., -1.,  2.]])

Once again, if the structure of the matrix is known, it is a good idea to see if an appropriate factorization and/or solve function is available. For example a banded matrix can be solved much faster with a specialized function than with general elimination functions.

## Exercise 2¶

1. Create three random 5 x 5 matrices $A, B$ and $C$ and pick a random vector $b$ with 5 entries. Compute $x = B^{-1}\, (2A + I)\,(C^{-1} + A)\; b$ without computing any matrix inverses.
2. Use LU factorization solve a 100 x 100 random linear system using the procedure outlined above. To check the quality of the solution, first start with a known random vector xtrue and compute the RHS vector b as dot(A, xtrue). This way we will have the true solution available to verify the answer. Now pretend you don't know the solution and solve $A\; x = b$ for unknown $x$ using the $b$ you generated above. Compare the resulting solution x with the true solution xtrue by computing the norm of the difference. This can be done using numpy.linalg.norm.

## Iterative solvers¶

Can linear systems be solve by using a few matrix vector multiplications? If there was an iterative procedure that produced increasingly accurate solution one could stop it after a few iterations if only an approximate solution is desired. Or maybe the iterations will reduce error so fast that in a few iterations, each costing $O(n^2)$ for matrix vector products we can produce an accurate enough answer.

Suppose a guess $x_0$ for the solution is available. If we could compute the error $e_0 = A^{-1}\, b - x_0$ from the guess then we would be done since the solution would be $x_0 + e_0 = x_0 + A^{-1}\, b - x_0 = A^{-1}\; b$. The catch is of course that if we had the inverse of $A$ available then we would already have been done!

The thing we do have a way to compute from the existing information is the residual for the guess, $r_0 = b - A\, x_0$. The residual is related to the error: \begin{align} r_0 &= b - A\, x_0\\ &= A\;(A^{-1}\, b - x_0)\\ &= A\, e_0 \end{align}

In general if the solution at end of iteration $k$ is $x_k$ and the error $e_k$ then the residual at that stage is related to the error by $$r_k = A\; e_k$$

Now suppose we have a matrix $M$ available that is easy to invert and has the additional property that $M^{-1}\; A$ is approximately (in some sense of the word) the identity matrix. The we can pretend to compute the error by replacing $A$ in $r_k = A\; e_k$ by $M$ and hope that this gives us an approximation of the error to use in correcting the current guess for the solution.

This can be turned into an iterative algorithm to solve $A\; x = b$ using only matrix vector multiplications as the most expensive part of the algorithm.

Initial guess for solution is x
r = b - A x
Solve M z = r
while not satisfied
x = x + z
r = b - A x
Solve M z = r


If $M$ is taken to be the diagonal of $A$ then the resulting iteration is called Jacobi iteration and if $M$ is the lower triangular part of $A$ the iteration is called Gauss-Seidel iteration.

The more modern iterative methods are conjugate gradient method that should be the first thing to try for symmetric positive definite matrices and the GMRES method for general matrices. The details of these methods are beyond the scope of this but you should definitely learn to use these as available in scipy.sparse.linalg.

## Exercise 3¶

1. Implement Jacobi iteration for the 5 x 5 discrete 1D Laplacian matrix $A$ and solve the linear system with RHS being a vector of all ones. Run the interation for 50 steps and print out the norm of the residual as you go. Or you can implement reasonable stopping criteria such as stopping when progress stalls or some maximum number of iterations has been exceeded.
2. Use the conjugate gradient solver cg from scipy.sparse.linalg to solve the system $A\, x = b$ where $b$ is a vector of ones and $A$ is a sparse discrete 1D Laplacian matrix from the previous problem. Print out the residual at each step using the optional argument callback=. This argument takes a function f(xk) which is called during each step and provides the user with the current guess for the solution xk.

## Condition number, error and residual¶

The condition number of a square matrix $A$ is $\operatorname{cond}(A):= || A ||\; || A^{-1} ||$.

Its importance is due to the following bounds on the errors in solving linear systems. Suppose $x$ is the true solution and $\hat{x}$ the computed solution of $A\, x = b$. Then the perturbation in the data $\Delta b$ and the error $\Delta x$ are related by $$\frac{||\Delta x||}{||x||} \le \operatorname{cond}(A)\frac{||\Delta b||}{||b||}$$

If instead of the RHS $b$ one perturbs the matrix $A$ and $E$ is this perturbation then

$$\frac{||\Delta x||}{||\hat{x}||} \le \operatorname{cond}(A)\frac{||E||}{||A||}$$

Usually the true solution is not available. One can only compute how much the computed solution $\hat{x}$ fails to satisfy the equation $A\, x = b$. This quantity $r := b - A\, \hat{x}$ is called the residual. The quantity $||r|| / (||A||\, ||\hat{x}||)$ is called the relative residual. The error and residual are related by $$\frac{||\Delta x||}{||\hat{x}||} \le \operatorname{cond}(A)\frac{||r||}{||A||\;||\hat{x}||}$$

Thus if the relative residual is small then the error can be guaranteed to be small only if the condition number is small.

The Hilbert matrix is parametrized by its size and gets increasingly ill-conditioned with size.

In [7]:
from scipy.linalg import hilbert

print hilbert(4)

[[ 1.          0.5         0.33333333  0.25      ]
[ 0.5         0.33333333  0.25        0.2       ]
[ 0.33333333  0.25        0.2         0.16666667]
[ 0.25        0.2         0.16666667  0.14285714]]

In [8]:
from numpy.linalg import cond

for n in [2, 5, 8, 11, 14]:
print cond(hilbert(n))

19.2814700679
476607.250242
15257575488.1
5.22103489302e+14
6.11155140292e+17


In contrast, for random matrices with entries drawn uniformly and independenly from the unit interval, the condition number appears to grow much more slowly. Experts in random matrix theory may have something to say about this sort of phenomena.

In [9]:
from numpy.random import rand, seed

seed(6172014)
for n in [2, 5, 8, 11, 14]:
print cond(rand(n, n))

4.95038266015
25.5190606861
48.4978814602
100.78848894
126.823572477


The impact of condition number on the error is visible very dramatically for Hilbert matrices.

In [10]:
from numpy.linalg import solve, norm
import numpy as np

for n in [2, 5, 8, 11, 14]:
A = hilbert(n)
b = np.dot(A, np.ones(n))
x = solve(A, b)
abserror = norm(np.ones(n) - x)
relerror = abserror/norm(np.ones(n))
print '%2d\t%2.5e\t%2.5e\t%2.5e' % (n, abserror, relerror, norm(b-np.dot(A,x)))

 2	5.97873e-16	4.22760e-16	0.00000e+00
5	1.53163e-11	6.84965e-12	0.00000e+00
8	1.33106e-06	4.70601e-07	4.00297e-16
11	2.21893e-02	6.69033e-03	5.20741e-16
14	7.74102e+01	2.06887e+01	3.26149e-15


Notice above that even though the residual continues to be somewhat small, the error grows dramatically because of the increasingly ill-conditioned matrices.

For the random matrices, due to the smaller condition numbers the errors don't grow so badly.

In [11]:
seed(6172014)
for n in [2, 5, 8, 11, 14]:
A = rand(n, n)
b = np.dot(A, np.ones(n))
x = solve(A, b)
abserror = norm(np.ones(n) - x)
relerror = abserror/norm(np.ones(n))
print '%2d\t%2.5e\t%2.5e\t%2.5e' % (n, abserror, relerror, norm(b-np.dot(A,x)))

 2	1.11022e-16	7.85046e-17	0.00000e+00
5	2.95828e-15	1.32298e-15	4.96507e-16
8	2.13556e-15	7.55033e-16	0.00000e+00
11	8.05428e-15	2.42846e-15	1.77636e-15
14	1.62552e-14	4.34438e-15	2.94575e-15


## Exercise 4¶

1. A square matrix is called orthogonal if it has orthogonal columns. How would you express that as a matrix equation? What is the 2-norm of an orthogonal matrix? What is its condition number?

## Preconditioning¶

To be continued ...