Least squares polynomial approximation
Nikolai Shokhirev
March 22, 2013
Problem statement
Given $L$ data points $\{x_{l},y_{l}\}=(x_{1},y_{1}),(x_{2},y_{2}),\ldots,(x_{L},y_{L})$
. The problem is to find the polynomial
\begin{equation}
f(x)=\sum_{n=0}^{N}a_{n}x^{n}\label{eq:fit_x}
\end{equation}
which minimizes the sum of squared residuals
\begin{equation}
R=\sum_{l=1}^{L}\left[f(x_{l})-y_{l}\right]^{2}\label{eq:res}
\end{equation}
For $N+1\lt L$ it is called a polynomial regression
[1] and
for $N+1=L$ it is called a polynomial interpolation
[2].
Equation (\ref{eq:fit_x}) can be rewritten in terms of the monomial
functions (for easier generalization later on)
\begin{equation}
M_{n}(x)=x^{n}\label{eq:mono}
\end{equation}
as
\begin{equation}
f(x)=\sum_{n=0}^{N}a_{n}M_{n}(x)\equiv\vec{a}^{\intercal}\cdot\vec{M}(x)\label{eq:fit_m}
\end{equation}
Substituting (\ref{eq:fit_m}) into (\ref{eq:res}) and taking the
derivatives over $a_{n}$ we get
\begin{equation}
\sum_{n=0}^{N}\sum_{l=1}^{L}M_{k}(x_{l})M_{n}(x_{l})a_{n}=\sum_{l=1}^{L}M_{k}(x_{l})y_{l}\label{eq:ls_eqm}
\end{equation}
or
\begin{equation}
\hat{M}^{\intercal}\cdot\hat{M}\cdot\vec{a}=\hat{M}^{\intercal}\cdot\vec{y}\label{eq:ls_eqmat}
\end{equation}
where
\begin{equation}
M_{l,k}=M_{k}(x_{l})=x_{l}^{k}\label{eq:m_mat}
\end{equation}
This is a $L\times(N+1)$ Vandermonde matrix
[3].
Solution
A formal solution of (\ref{eq:ls_eqmat}) is
\begin{equation}
\vec{a}=\left(\hat{M}^{\intercal}\cdot\hat{M}\right)^{-1}\cdot\hat{M}^{\intercal}\cdot\vec{y}\label{eq:m_solution}
\end{equation}
if the matrix $\hat{M}^{\intercal}\cdot\hat{M}$ is invertible. In
this case the optimal coefficients are
\begin{equation}
a_{n}=\sum_{l=1}^{L}z_{n,l}^{N}\, y_{l}
\end{equation}
where
\begin{equation}
z_{n,l}^{N}=\sum_{k=0}^{N}\left[\left(\hat{M}^{\intercal}\cdot\hat{M}\right)^{-1}\right]_{n,k}M_{k}(x_{l})
\end{equation}
For matrix inversion any conventional methods can be used, e.g. the
LU decomposition. This solution is usually recommended in the literature
(see e.g.
[4]).
Better solution
The matrix (\ref{eq:m_mat}) can be factorized as the Singular value
decomposition (SVD)
\begin{equation}
\hat{M}=\hat{U}\cdot\hat{\sigma}\cdot\hat{V}^{\intercal}\equiv\sum_{s}\sigma_{s}\vec{u}_{s}\cdot\vec{v}_{s}^{\intercal}
\end{equation}
Here only non-zero singular values $\sigma_{s}$ are included. The
main least squares equation (\ref{eq:ls_eqmat}) can be rewritten
now as
\begin{equation}
\sum_{s}\sigma_{s}\vec{v}_{s}\cdot\vec{v}_{s}^{\intercal}\cdot\vec{a}=\sum_{s}\sigma_{s}\vec{v}_{s}\cdot\vec{u}_{s}^{\intercal}\cdot\vec{y}
\end{equation}
or using the orthogonality of the singular vectors we can equate the
components
\begin{equation}
\sigma_{s}\vec{v}_{s}^{\intercal}\cdot\vec{a}=\vec{u}_{s}^{\intercal}\cdot\vec{y}
\end{equation}
The solution now is
\begin{equation}
\vec{a}=\sum_{s}\sigma_{s}^{-1}\vec{v}_{s}\cdot\left(\vec{u}_{s}^{\intercal}\cdot\vec{y}\right)\label{eq:svd_solution}
\end{equation}
Note, that the condition number of this problem is
\begin{equation}
\kappa=\frac{max(\sigma_{s})}{min(\sigma_{s})}
\end{equation}
Here all zero singular values are already excluded. Whereas, the condition
number of the problem (\ref{eq:ls_eqmat}) is $\kappa^{2}$, which
can be much larger (and the solution (\ref{eq:m_solution}) is less
accurate and stable).
The best solution
On the other hand, instead of the monomials (\ref{eq:mono}) in (\ref{eq:fit_m})
any linear independent set of polynomials can be used
\begin{equation}
f(x)=\sum_{n=0}^{N}b_{n}P_{n}(x)\label{eq:fit_p}
\end{equation}
And instead of (\ref{eq:ls_eqm}) we have
\begin{equation}
\sum_{n=0}^{N}\sum_{l=0}^{L}P_{k}(x_{l})P_{n}(x_{l})b_{n}=\sum_{l=0}^{L}P_{k}(x_{l})y_{l}\label{eq:ls_eqp}
\end{equation}
We can take advantage of the freedom of choice of the polynomials
and require that
\begin{equation}
\sum_{l=0}^{L}P_{k}(x_{l})P_{n}(x_{l})\equiv\left\langle P_{k}|P_{n}\right\rangle =\delta_{n,k}\label{eq:p_ort}
\end{equation}
In the other words, the polynomials are orthonormal on the set $\{x_{1},x_{2},\ldots,x_{L}\}$
. The brackets denote the scalar product of L-component vectors. The
condition (\ref{eq:p_ort}) reduces (\ref{eq:ls_eqp}) to
\begin{equation}
b_{k}=\sum_{l=0}^{L}P_{k}(x_{l})y_{l}=\left\langle P_{k}|Y\,\right\rangle
\end{equation}
There is no need for matrix inversion. If, in addition, we require
that $P_{k}(x)$ is exactly of the order $k$ , then
\begin{equation}
f(x)=\sum_{k=0}^{K}b_{k}P_{k}(x)
\end{equation}
is automatically the fitting polynomial of the order $K$ in the least
squares sense. If the basis polynomials are defined as
\begin{equation}
P_{k}(x)=\sum_{i=0}^{k}c_{k,i}M_{i}(x)\label{eq:ort_poly}
\end{equation}
then the coefficients are
\begin{equation}
a_{i}=\sum_{k=i}^{K}b_{k}c_{k,i}
\end{equation}
How to build
The relation (\ref{eq:ort_poly}) can be rewritten as
\begin{equation}
\left(\begin{array}{c}
P_{0}(x)\\
P_{1}(x)\\
P_{2}(x)\\
\vdots\\
P_{N}(x)
\end{array}\right)=\left(\begin{array}{ccccc}
c_{0,0} & 0 & 0 & \cdots & 0\\
c_{1,0} & c_{1,1} & 0 & \cdots & 0\\
c_{2,0} & c_{2,1} & c_{2,2} & \cdots & 0\\
\vdots & \vdots & \vdots & \ddots & \vdots\\
c_{N,0} & c_{N,1} & c_{N,2} & \cdots & c_{N,N}
\end{array}\right)\cdot\left(\begin{array}{c}
M_{0}(x)\\
M_{1}(x)\\
M_{2}(x)\\
\vdots\\
M_{N}(x)
\end{array}\right)
\end{equation}
or
\begin{equation}
\vec{P}(x)=\hat{C}\cdot\vec{M}(x)
\end{equation}
The orthogonal polynomials can be built recurcively using the Gram-Schmidt process
[5]:
Initialization:
\begin{equation}
P_{0}(x)=M_{0}(x)/\sqrt{\left\langle M_{0}|M_{0}\right\rangle }\label{eq:init}
\end{equation}
Recursion:
\begin{equation}
P_{n+1}(x)=M_{n+1}(x)-\sum_{k=0}^{n}\left\langle M_{n+1}|P_{k}\right\rangle P_{k}(x)\label{eq:recur}
\end{equation}
and
normalization
\begin{equation}
P_{n+1}(x)\Leftarrow P_{n+1}(x)/\sqrt{\left\langle P_{n+1}|P_{n+1}\right\rangle }\label{eq:norm}
\end{equation}
Savitzky-Golay smoothing filter
For the case of equidistant abscissae, Savitzky and Golay
[6]
developed digital filters for smoothing signals and calculating derivatives.
The The filters are based on local $L=2m+1$ polynomial regressions
of the order $N\lt 2m$. Gorry
[7] generalized the Savitzky-Golay
filter for the end points using the orthogonal polynomial approach
("The best solution"). Furthermore, he derived closed analytical
formulae for the polynomials. This subject will be discussed in the subsequent
article.
References
-
Wikipedia Polynomial regression.
-
Wikipedia Polynomial interpolation.
-
Wikipedia Vandermonde matrix.
-
W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery. Numerical Recipes: The Art of Scientific Computing .
-
Wikipedia Gram-Schmidt process.
-
A. Savitzky and M. J. E. Golay, "Smoothing and Differentiation of Data by Simplified Least Squares Procedures," Analytical Chemistry, vol. 36, pp. 1627-1639, 1964.
www.
-
P. A. Gorry, "General Least-Squares Smoothing and Differentiation by the Convolution (Savitzky-Golay) Method" Analytical Chemistry, vol. 62, pp. 570-573, 1990.
www.