Least squares polynomial approximation

Nikolai Shokhirev

March 22, 2013

Problem statement

Given $L$ data points $\{x_{l},y_{l}\}=(x_{1},y_{1}),(x_{2},y_{2}),\ldots,(x_{L},y_{L})$ . The problem is to find the polynomial \begin{equation} f(x)=\sum_{n=0}^{N}a_{n}x^{n}\label{eq:fit_x} \end{equation} which minimizes the sum of squared residuals \begin{equation} R=\sum_{l=1}^{L}\left[f(x_{l})-y_{l}\right]^{2}\label{eq:res} \end{equation} For $N+1\lt L$ it is called a polynomial regression [1] and for $N+1=L$ it is called a polynomial interpolation [2]. Equation (\ref{eq:fit_x}) can be rewritten in terms of the monomial functions (for easier generalization later on) \begin{equation} M_{n}(x)=x^{n}\label{eq:mono} \end{equation} as \begin{equation} f(x)=\sum_{n=0}^{N}a_{n}M_{n}(x)\equiv\vec{a}^{\intercal}\cdot\vec{M}(x)\label{eq:fit_m} \end{equation} Substituting (\ref{eq:fit_m}) into (\ref{eq:res}) and taking the derivatives over $a_{n}$ we get \begin{equation} \sum_{n=0}^{N}\sum_{l=1}^{L}M_{k}(x_{l})M_{n}(x_{l})a_{n}=\sum_{l=1}^{L}M_{k}(x_{l})y_{l}\label{eq:ls_eqm} \end{equation} or \begin{equation} \hat{M}^{\intercal}\cdot\hat{M}\cdot\vec{a}=\hat{M}^{\intercal}\cdot\vec{y}\label{eq:ls_eqmat} \end{equation} where \begin{equation} M_{l,k}=M_{k}(x_{l})=x_{l}^{k}\label{eq:m_mat} \end{equation} This is a $L\times(N+1)$ Vandermonde matrix [3].


A formal solution of (\ref{eq:ls_eqmat}) is \begin{equation} \vec{a}=\left(\hat{M}^{\intercal}\cdot\hat{M}\right)^{-1}\cdot\hat{M}^{\intercal}\cdot\vec{y}\label{eq:m_solution} \end{equation} if the matrix $\hat{M}^{\intercal}\cdot\hat{M}$ is invertible. In this case the optimal coefficients are \begin{equation} a_{n}=\sum_{l=1}^{L}z_{n,l}^{N}\, y_{l} \end{equation} where \begin{equation} z_{n,l}^{N}=\sum_{k=0}^{N}\left[\left(\hat{M}^{\intercal}\cdot\hat{M}\right)^{-1}\right]_{n,k}M_{k}(x_{l}) \end{equation} For matrix inversion any conventional methods can be used, e.g. the LU decomposition. This solution is usually recommended in the literature (see e.g. [4]).

Better solution

The matrix (\ref{eq:m_mat}) can be factorized as the Singular value decomposition (SVD) \begin{equation} \hat{M}=\hat{U}\cdot\hat{\sigma}\cdot\hat{V}^{\intercal}\equiv\sum_{s}\sigma_{s}\vec{u}_{s}\cdot\vec{v}_{s}^{\intercal} \end{equation} Here only non-zero singular values $\sigma_{s}$ are included. The main least squares equation (\ref{eq:ls_eqmat}) can be rewritten now as \begin{equation} \sum_{s}\sigma_{s}\vec{v}_{s}\cdot\vec{v}_{s}^{\intercal}\cdot\vec{a}=\sum_{s}\sigma_{s}\vec{v}_{s}\cdot\vec{u}_{s}^{\intercal}\cdot\vec{y} \end{equation} or using the orthogonality of the singular vectors we can equate the components \begin{equation} \sigma_{s}\vec{v}_{s}^{\intercal}\cdot\vec{a}=\vec{u}_{s}^{\intercal}\cdot\vec{y} \end{equation} The solution now is \begin{equation} \vec{a}=\sum_{s}\sigma_{s}^{-1}\vec{v}_{s}\cdot\left(\vec{u}_{s}^{\intercal}\cdot\vec{y}\right)\label{eq:svd_solution} \end{equation} Note, that the condition number of this problem is \begin{equation} \kappa=\frac{max(\sigma_{s})}{min(\sigma_{s})} \end{equation} Here all zero singular values are already excluded. Whereas, the condition number of the problem (\ref{eq:ls_eqmat}) is $\kappa^{2}$, which can be much larger (and the solution (\ref{eq:m_solution}) is less accurate and stable).

The best solution

On the other hand, instead of the monomials (\ref{eq:mono}) in (\ref{eq:fit_m}) any linear independent set of polynomials can be used \begin{equation} f(x)=\sum_{n=0}^{N}b_{n}P_{n}(x)\label{eq:fit_p} \end{equation} And instead of (\ref{eq:ls_eqm}) we have \begin{equation} \sum_{n=0}^{N}\sum_{l=0}^{L}P_{k}(x_{l})P_{n}(x_{l})b_{n}=\sum_{l=0}^{L}P_{k}(x_{l})y_{l}\label{eq:ls_eqp} \end{equation} We can take advantage of the freedom of choice of the polynomials and require that \begin{equation} \sum_{l=0}^{L}P_{k}(x_{l})P_{n}(x_{l})\equiv\left\langle P_{k}|P_{n}\right\rangle =\delta_{n,k}\label{eq:p_ort} \end{equation} In the other words, the polynomials are orthonormal on the set $\{x_{1},x_{2},\ldots,x_{L}\}$ . The brackets denote the scalar product of L-component vectors. The condition (\ref{eq:p_ort}) reduces (\ref{eq:ls_eqp}) to \begin{equation} b_{k}=\sum_{l=0}^{L}P_{k}(x_{l})y_{l}=\left\langle P_{k}|Y\,\right\rangle \end{equation} There is no need for matrix inversion. If, in addition, we require that $P_{k}(x)$ is exactly of the order $k$ , then \begin{equation} f(x)=\sum_{k=0}^{K}b_{k}P_{k}(x) \end{equation} is automatically the fitting polynomial of the order $K$ in the least squares sense. If the basis polynomials are defined as \begin{equation} P_{k}(x)=\sum_{i=0}^{k}c_{k,i}M_{i}(x)\label{eq:ort_poly} \end{equation} then the coefficients are \begin{equation} a_{i}=\sum_{k=i}^{K}b_{k}c_{k,i} \end{equation}

How to build

The relation (\ref{eq:ort_poly}) can be rewritten as \begin{equation} \left(\begin{array}{c} P_{0}(x)\\ P_{1}(x)\\ P_{2}(x)\\ \vdots\\ P_{N}(x) \end{array}\right)=\left(\begin{array}{ccccc} c_{0,0} & 0 & 0 & \cdots & 0\\ c_{1,0} & c_{1,1} & 0 & \cdots & 0\\ c_{2,0} & c_{2,1} & c_{2,2} & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ c_{N,0} & c_{N,1} & c_{N,2} & \cdots & c_{N,N} \end{array}\right)\cdot\left(\begin{array}{c} M_{0}(x)\\ M_{1}(x)\\ M_{2}(x)\\ \vdots\\ M_{N}(x) \end{array}\right) \end{equation} or \begin{equation} \vec{P}(x)=\hat{C}\cdot\vec{M}(x) \end{equation} The orthogonal polynomials can be built recurcively using the Gram-Schmidt process [5]:

Initialization: \begin{equation} P_{0}(x)=M_{0}(x)/\sqrt{\left\langle M_{0}|M_{0}\right\rangle }\label{eq:init} \end{equation} Recursion: \begin{equation} P_{n+1}(x)=M_{n+1}(x)-\sum_{k=0}^{n}\left\langle M_{n+1}|P_{k}\right\rangle P_{k}(x)\label{eq:recur} \end{equation} and normalization \begin{equation} P_{n+1}(x)\Leftarrow P_{n+1}(x)/\sqrt{\left\langle P_{n+1}|P_{n+1}\right\rangle }\label{eq:norm} \end{equation}

Savitzky-Golay smoothing filter

For the case of equidistant abscissae, Savitzky and Golay [6] developed digital filters for smoothing signals and calculating derivatives. The The filters are based on local $L=2m+1$ polynomial regressions of the order $N\lt 2m$. Gorry [7] generalized the Savitzky-Golay filter for the end points using the orthogonal polynomial approach ("The best solution"). Furthermore, he derived closed analytical formulae for the polynomials. This subject will be discussed in the subsequent article.


  1. Wikipedia Polynomial regression.
  2. Wikipedia Polynomial interpolation.
  3. Wikipedia Vandermonde matrix.
  4. W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery. Numerical Recipes: The Art of Scientific Computing .
  5. Wikipedia Gram-Schmidt process.
  6. A. Savitzky and M. J. E. Golay, "Smoothing and Differentiation of Data by Simplified Least Squares Procedures," Analytical Chemistry, vol. 36, pp. 1627-1639, 1964. www.
  7. P. A. Gorry, "General Least-Squares Smoothing and Differentiation by the Convolution (Savitzky-Golay) Method" Analytical Chemistry, vol. 62, pp. 570-573, 1990. www.

© Nikolai Shokhirev, 2012-2017

email: nikolai(dot)shokhirev(at)gmail(dot)com