Numerical experiments, Tips, Tricks and Gotchas
Least absolute deviations
1. Introduction
Let us start with the least squares problem. Given $N$ data points:
$x_{1},\, x_{2},\, x_{3},\,\ldots\,,\, x_{N}$ . The problem is to
find the value $c$, which minimizes the sum of squared deviations
\begin{equation}
d_{2}(c)=\sum_{i=1}^{N}\left(x_{i}-c\right)^{2}\label{eq:res}
\end{equation}
The function $d_{2}(c)$ is a smooth function of $c$ and the problem
is solved by equating the derivative to zero
\begin{equation}
0=-\frac{\partial}{\partial c}d_{2}(c)=\sum_{i=1}^{N}\left(x_{i}-c\right)
\end{equation}
or
\begin{equation}
c=\frac{1}{N}\sum_{i=1}^{N}x_{i}
\end{equation}
The optimal value is just the arithmetic mean of $x_{i}$.
2. Least absolute deviations
A similar problem is to minimize the absolute deviations:
\begin{equation}
d_{1}(c)=\sum_{i=1}^{N}|\, x_{i}-c\,|\label{eq:res-1}
\end{equation}
If we assume that : $x_{1}\leq\, x_{2}\leq\, x_{3}\leq\,\ldots\,\leq\, x_{N}$
then
\begin{equation}
d_{1}(c)=\begin{cases}
X-Nc ,\; c\leq x_{1}\\
X-2x_{1}-(N-2)c ,\: x_{1}\lt c\leq x_{2}\\
\cdots \cdots\\
-X+Nc ,\; x_{N} \lt c
\end{cases}\label{eq:d1}
\end{equation}
where
\[
X=\sum_{i=1}^{N}x_{i}
\]
Eq. (\ref{eq:d1}) is a continuous broken line (piecewise linear function).
Its derivative is a piecewise constant function with jump discontinuities
at $x_{i}$:
\begin{equation}
d_{1}^{\,\prime}(c)=\begin{cases}
-N ,\; c\leq x_{1}\\
2k-N ,\: x_{k-1}\lt c\leq x_{k}\\
N ,\; x_{N} \lt c
\end{cases}\label{eq:dd1}
\end{equation}
2.1. Exact solution
The slope of $d_{1}(c)$ changes from $-N$ to $+N$ by $+2$ at each
$x_{i}$ . If $N$ is odd then for $c\lt x_{m}$ the slope is negative
and for $c>x_{m}$ the slope is positive. Here $m=\frac{N+1}{2}$
labels the median value. Therefore $c=x_{m}$ is the unique solution.
If $N$ is even then for $c\lt x_{m1}$ the slope is negative, for $c>x_{m2}$
the slope is positive and for $x_{m_{1}}\lt c\lt x_{m_{2}}$ the function
$d_{1}(c)$ (\ref{eq:d1}) is flat. Here $m_{1}=\frac{N}{2}$ and
$m_{2}=\frac{N}{2}+1$ . This uncertainty can be resolved again by
assigning to $c$ the median value $c=\frac{1}{2}\left(x_{m_{1}}+x_{m_{2}}\right)$
2.2. Regularization
The $abs$ function can be approximated by a smooth function:
\begin{equation}
|\, z\,|\approx\varphi_{\varepsilon}(z)=\sqrt{\left(\varepsilon^{2}+z^{2}\right)}\label{eq:sabs}
\end{equation}
Substituting $abs$ with (\ref{eq:sabs}) in (\ref{eq:res-1}) we
have
\[
\tilde{d}{}_{1}(c)=\sum_{i=1}^{N}\varphi_{\varepsilon}\left(x_{i}-c\right)
\]
Now the derivative is a continuous function and a regular approach
can be used
\[
0=-\frac{\partial}{\partial c}\tilde{d}{}_{1}(c)=\sum_{i=1}^{N}\left(x_{i}-c\right)\varphi_{\varepsilon}^{\,\prime}\left(x_{i}-c\right)
\]
Note that the regularization makes the solution unique.
3. Continuous distribution
The continuous analogue of (\ref{eq:res-1}) is
\begin{equation}
d_{1}(c)=\intop_{-\infty}^{\infty}\rho(x)\,|x-c|\, dx\label{eq:d1c}
\end{equation}
Here $\rho(x)$ is a density function
[
2].
It can be interpreted as a
probability density if we normalize it with the total "mass"
\[
N=\intop_{-\infty}^{\infty}\rho(x)\, dx
\]
By definition
\[
d_{1}(c)=-\intop_{-\infty}^{c}\rho(x)\,(x-c)\, dx+\intop_{c}^{\infty}\rho(x)\,(x-c)\, dx
\]
For a bounded density function the derivative is
\begin{eqnarray}
0 \; = \: \frac{\partial}{\partial c}d_{1}(c)\nonumber \\
\; \; = \: \intop_{-\infty}^{c}\rho(x)dx-\intop_{c}^{\infty}\rho(x)dx\label{eq:med}
\end{eqnarray}
Here it is assumed that $\underset{x\rightarrow c}{\lim}\rho(x)\,(x-c)=0$.
Eq (\ref{eq:med}) shows that the optimal $c$ divides the distribution
$\rho(x)$ into two parts with equal amounts:
\begin{equation}
\intop_{-\infty}^{c}\rho(x)dx=\intop_{c}^{\infty}\rho(x)dx\label{eq:eq}
\end{equation}
In probability theory this is the median of $\rho(x)$ [
1].
4. Back to the discrete case
The initial problem (\ref{eq:res-1}) can be presented in the form
(\ref{eq:d1c}) with the following density
\begin{equation}
\rho_{e}(x)=\sum_{i=1}^{N}\delta(x-x_{i})\label{eq:rhoe}
\end{equation}
Each delta function contributes 1 to the integrals in (\ref{eq:eq})
and the integrals are just the counts of points to the left and to
the right from $c$. To satisfy the equality for even $N$ , $c$ should
be placed somewhere between $x_{m_{1}}$and $x_{m_{2}}$. For odd
$N$ the choice is unique: $c=x_{m}$ because the counts are equal
zero contribution to (\ref{eq:d1c}) from $x_{m}$ :
\[
\int\delta(x-x_{m})\,|x-x_{m}|\, dx=0
\]
Numerical experiments
The IPython HTML notebook experiments are available at the links below.
The IPython HTML notebooks can be:
References
- Median, Wikipedia.
-
Empirical density functions, N.V.Shokhirev, NomericalExpert.com 2013.