*least squares* problem is to do this by minimizing the 2-norm $||\;b - A\;x||$ of the residual (hence the name least squares).

*overdetermined* least squares problem. The case $m

Since we want the residual $r = b - A\; x$ to be orthogonal to the image of $A$, this is the same as requiring $r \in \operatorname{im} A^\perp = \operatorname{ker} A^T$. Thus we want

$$ \begin{align} A^T \; r &= 0\\ A^T\; (b - A\; x) &=0\\ A^T\; b - A^T\;A\; x &= 0 \end{align} $$Thus the least squares problem can be solved if we can solve the square linear system

$$ A^T\; A\; x = A^T\; b $$*normal equation*.

*pseudoinverse* of $A$ often written $A^+$. Of course typically we will not use inverses and instead solve the normal equations as a linear solve.

- Compute the conditioning of $A$ and $A^T\; A$ for a few random matrices $A$. How do you think the conditioning of $A$ and $A^T\;A$ are related generally?
- Use the normal equation method to solve a least squares problem that finds the "best fit" line through the dataset:
x = [-0.01, 0.078, 0.191, 0.318, 0.467, 0.54, 0.694, 0.76, 0.898, 0.988] y = [0.988, 1.056, 1.142, 1.156, 1.221, 1.279, 1.349, 1.367, 1.463, 1.536]

Although this is commonly refered to as linear regression, you can see that it really isn't exclusive to lines. You could easily adapt the code to fit quadratics, cubics and so on. - Repeat the above exercise using the least square solver
`lstsq`

function from`scipy.linalg`

. - Repeat the exercise using
`numpy.polyfit(x, y, deg)`

to compute the best fit line instead. - Finally repeat using
`scipy.stats.linregress`

. It provides you with information about how good the fit is.

*can* do is to replace the matrix $A$ by another matrix which has the same image but is much better conditioned.

What matrix has the best conditioning ? An orthogonal matrix !

*QR factorization* of $A$. For every full rank tall matrix $A$ of size $m$ x $n$ one can find an orthogonal matrix $Q$ of size $m$ x $m$ and an upper right "triangular" matrix $R$ such that $A = Q\; R$.

*reduced* (also known as *economy* or *economic*) and the *full* QR factorizations. The pictures below captures the essesntial features of the two.

The following example computes the reduced and full QR factorizations.

In [1]:

```
import numpy as np
from scipy.linalg import qr
np.random.seed(6142016)
A = np.random.rand(3, 2)
Q, R = qr(A, mode='economic')
print 'Economic QR'
print
print Q
print
print R
print
A = np.random.rand(3, 2)
Q, R = qr(A, mode='full')
print 'Full QR'
print
print Q
print
print R
print
```

Then cancelling the $\hat{R}^T$ which we can do since $\hat{R}$ is nonsingular (since $A$ was assumed to be full rank) we get the system to solve

$$ \hat{R} \; x = \hat{Q}^T\; b $$- Solve the linear regression problem from the last exercise, but this time using reduced QR factorization.

*Reduced and full SVD were covered in lecture. Will be added here later.*