Project 6

Problem 1

This problem is partly the same as one of the exercises, just with more data. The data generation and manipulation are the only new parts. In this problem you will solve a simple linear regression problem. You will first generate a random collection of data points, then skew and shift them in the specified manner and finally find the line of best fit for the resulting cloud of data points.

Part 1

Pick 200 random numbers drawn independently from the standard normal distribution with mean 0 and variance 1 and put those in an array r. (To make it easy to debug and compare, lets all put the line numpy.random.seed(92111) at the start of our code before calling any numpy.random functions. Next draw 200 random numbers independently from the uniform distribution on $[0, 2\pi)$ and put those into an array theta. Treating each r[i], theta[i] as specifying a number in polar coordinates, create the corresponding 200 points in cartesian coordinates and plot this cloud of points as a scatter plot.

Part 2

To each of the 200 cartesian coordinate points that you created in previous part apply the matrix $M$ given below. Follow that by adding 2 to the $y$-coordinate of each of the resulting points. Call this final resulting cloud of points data and make a scatter plot of this cloud of points.

$$ M = \begin{bmatrix} 7.07106781 & -0.70710678 \\ 7.07106781 & 0.70710678 \end{bmatrix}\, . $$

Thus now data contains the cloud of points from part 1 transformed by $M$ and shifted up by 2 units.

Explain why this matrix does what it does to your cloud of points.

Part 3

Your task now is to fit a straight line $y = mx + c$ to the cloud of points in data by using least squares. Formulate this as a least squares problem $A z \simeq b$ and solve it by solving the resulting normal equation $A^T\, A\, z = A^T b$ using numpy.linalg.solve.

Part 4

Now solve the above least squares problem $A z \simeq b$ using a reduced QR factorization.

Part 5

Using the slope and intercept obtained by one of the above solution methods, plot a line of that slope and intercept on top of a scatter plot of data.


Problem 2

In this project you will work with the SVD of an image and study the infomation content in it by looking at the smallest or largest singular values.

Part 1

Read the image lake.tif into an array using scipy.ndimage.imread and extract the red, green, and blue parts of the image into 2-dimensional arrays $R, G$, and $B$ -- the image channels. (If the imread function, which requires the PIL library, does not work on your system we will provide the arrays as files.) You can display the resulting collection of 3 arrays as an image using matplotlib.pyplot.imshow.

Part 2

This image has more columns than rows and we only looked at the meaning of SVD for matrices with more rows than columns (although SVD does make sense even for the other kind of matrices). In this project we will work with transposes of the $R, G, B$ to stay within the realm of what we studied in the lecture. Compute a reduced SVD of the 3 arrays $R^T, G^T, B^T$. This will give you 3 matrices for each. To test your results, do the multiplication needed to recreate the three channels, combine the channels and display the resulting image. For convenience in referring to the various objects we will refer to the reduced SVD matrices of $R^T$ as $\hat{U}_r, \hat{\Sigma}_r, V^T_r$ and similarly with subscripts $g$ and $b$.

Part 3

For $n$ in $\{200,100,50,10,1\}$ construct images using the largest $n$ singular vectors and values (first $n$ columns of $\hat{U}_r, \hat{U}_g, \hat{U}_b$ and rows of $V^T_r, V^T_g, V^T_b$ and the first $n$ singular values). Make sure the resulting images are of the same dimension as the original image. Display the resulting images. You will see that the image stays recognizable and similar to the original image even when very few singular values are used.

Part 4

Repeat the previous task but with the smallest $n$ singular vectors and values. The images will not look like the original image.

Part 5

Plot the singular values for all the channels. You should see a very rapid drop in size of the singular values which justifies using only a few largest singular values and vectors. Thus much of the information content of the image is in the largest singular vectors and values.


Problem 3

Problem 3

Recall that the discrete Laplacian is the tridiagonal matrix: $$ \left[ \begin{array}{ccccc} 2 & -1 & & & \\ -1 & 2 & -1 & & \\ & -1 & 2 & -1 & \\ & . & . & . & \\ & & . & . & . \\ & & & -1 & 2 \end{array} \right] $$ One way this shows up is in the finite differences approximation of the 1D Poisson equation: $$ -\Delta u = -u_{xx} = f $$

on the interval $[0,1]$ with boundary conditions $u(0)=0$ and $u(1)=0$.

We can approximate $u_{xx}$ by creating a uniform grid of $N$ points in the interval $[0,1]$ which are spaced apart by distance $h = 1/(N-1)$. The finite differences formula for this is given by: $$ -u_{xx} \approx - \frac{u_{i-1}-2u_i+u_{i+1}}{h^2} $$

Notice that, based on this formula, the discrete Laplacian scales with $h^2$. So we'll make the slight change and work with the matrix: $$ A=\frac{1}{h^2} \left[ \begin{array}{ccccc} 2 & -1 & & & \\ -1 & 2 & -1 & & \\ & -1 & 2 & -1 & \\ & . & . & . & \\ & & . & . & . \\ & & & -1 & 2 \end{array} \right] $$

It is easy to check that the eigenfunctions of the Laplacian (i.e. of the second derivative operator in case of 1D), subject to the above boundary conditions on the unit interval are $\sin(n\pi x)$ with eigenvalue $n^2\pi^2$ for all $n\ge 1$.

The goal of this problem is to use eigvalsh function in scipy.linalg to check how closely we're approximating the eigenvalues of the continuous problem using the discrete $N$ x $N$ Laplacian as $N$ grows. If you want to study the eigenvectors too then use eigsh. Ideally you should be working with sparse matrices and so you should consider using scipy.sparse.linalg.eigsh

For simplicity, you could just check how well, say, the first 6 eigenvalues are approximated as $N$ grows.

Note: eigvalsh should give you the eigenvalues in sorted order. So, you can just take the first 6 entries.


Problem 4

This is a continuation of the Markov chain problem from one of the earlier projects, with a slightly larger transition matrix. Consider the Markov chain with transition matrix

$$ A = \begin{bmatrix} 0.8 & 0.2 & 0.1\\ 0.1 & 0.7 & 0.3\\ 0.1 & 0.1 & 0.6 \end{bmatrix} $$

In that earlier problem and in the Wikipedia article linked above, the transition matrices are written such that the sum or each row is 1. In that case all the discussion and problems below can be transormed by transposing. In this problem we wrote the transition matrix with sum of each column 1 in order to represent transitions from state $x$ as $A\; x$ rather than $x^T\; A$.

Recall that the entry $(i, j)$ of this matrix gives the probability of going from state $j$ to state $i$. Let a 3-vector $x^{(k)}$ whose entries sum to 1 represent the probabilities of being in the three states after step $k$.

If the initial probability distribution is $x^{(0)}$ then the probability distribution after $k$ steps (transitions) is

$$ x^{(k)} = A\; x^{(k-1)} = A^k\; x^{(0)} $$

Suppose the system starts in state 0 (states are numbered 0, 1, 2).

  1. What is the long term probability distribution vector?
  2. Does it depend on the starting vector?
  3. What is limiting value of $\lim_{k\to \infty} A^k$ and what is the rank of this matrix?
  4. Explain your result from the previous part in terms of eigenvalues and eigenvectors of $A$.
  5. Must 1 always be an eigenvalue of the transition matrix of a Markov chain? Why?
  6. A probability distribution vector $x$ is called stationary if $A\; x = x$. How can you determine a stationary $x$?

This problem is from Scientific Computing by Michael T. Heath.