We started out by showing how a $PLU$ decomposition can be used to solve a dense linear system. Another common factorization mentioned was the $QR$ decomposition. Try setting up a method to solve a system using $QR$-factoriaztion instead of $PLU$-factorization.

*We'll see this factorization turns out to be useful in the linear least squares problem!*

Consider the following matrix:

$$ A = \begin{bmatrix} 100 & 100 \\ 100 & 100 + \epsilon \end{bmatrix} $$Use the `cond` function from `numpy.linalg` to estimate the condition number of $A$ as you change $\epsilon$. Try solving $Ax=b$ and computing the norm of the residual and see how it relates to the condition number.

Write the CSR representation for the matrix:

$$ A = \begin{bmatrix} 1 & 0 & 3 \\ 0 & 5 & 0 \\ 0 & 0 & 2 \end{bmatrix} $$*Since, we're on the topic, think about about you may implement matrix multiplication for the COO or CSR matrix formats.*

At each step, the GMRES and CG methods attempt to minimize a particular quantity. As mentioned, GMRES attempts to minimize the norm of the residual over the current subspace. On the other hand, CG attempts to minimize the $A$-norm or the energy norm of the error given by $\sqrt{(Ax,x)}$. In this problem, we'll check how well CG minimize the residual to see if these are genuinely different strategies.

First, create an SPD matrix; either randomly or by your own choosing. Then, choose a particular $x$ and compute $b=Ax$. We're going to use this as a known solution.

Use CG and the `callback(xk)` function to construct and plot the energy norm of the error and the residual throughout the iteration. (By error, I mean, look at the difference between our true solution `x` and the current guess `xk` provided by CG's callback function. For the residual, you can compute `A xk - b`.)

The behavior of GMRES can be a little subtle. First, try using GMRES on a random $40 \times 40$ matrix. You'll notice that genenrally requires nearly all $40$ iterations... This is no good, as we may as well have just solved a linear least squares problem to begin with, since GMRES can see almost the entire space.

It turns out that GMRES's performance is closely related to how close $A$ is to being normal and how clustered its eigenvalues are away from $0$. More precisely, it's behavior in the $k$-th step is determined by how effectively we can find a polynomial of degree $k+1$ which is "small" when evaluated on all of $A$'s eigenvalues. A useful consequence of this is that the maximum number of iterations is at most the degree of the minimal polynomial of $A$.

Conduct a brief set of tests which compare the number of iterations GMRES requires on the following matrices:

- The matrix with diagonal 1, 2, 3, ... , 40.
- The matrix with diagonal 101, 102, 103, ... , 140.
- The matrix with diagonal 10001, 10002, 10003, ... , 10040.
- A matrix with 5 small clusters of 10 distinct eigenvalues each at 10, 20, 30, 40 and 50. Try playing with how tightly packed the clusters are and seeing if it makes a difference.

Finally, experiment with the relationship between the number of steps is affected by the deviation from normality. (To measure this, you can compute the norm of $AA^t - A^tA$.)

*Try this one if you're interested. I put "involved?" as open ended problems can easily become too time consuming; so consider it optional. Not that all the problems aren't really optional. :-)*

*To give credit where credit is due: This exercise was inspired by one I learned of through Luke Olson. If you're interested in learning more about these kind of things, the CS department offers a course CS 556 which covers iterative solvers.*