Least absolute deviations

Numerical experiments, Tips, Tricks and Gotchas

Least absolute deviations

1. Introduction

Let us start with the least squares problem. Given $N$ data points: $x_{1},\, x_{2},\, x_{3},\,\ldots\,,\, x_{N}$ . The problem is to find the value $c$, which minimizes the sum of squared deviations \begin{equation} d_{2}(c)=\sum_{i=1}^{N}\left(x_{i}-c\right)^{2}\label{eq:res} \end{equation} The function $d_{2}(c)$ is a smooth function of $c$ and the problem is solved by equating the derivative to zero \begin{equation} 0=-\frac{\partial}{\partial c}d_{2}(c)=\sum_{i=1}^{N}\left(x_{i}-c\right) \end{equation} or \begin{equation} c=\frac{1}{N}\sum_{i=1}^{N}x_{i} \end{equation} The optimal value is just the arithmetic mean of $x_{i}$.

2. Least absolute deviations

A similar problem is to minimize the absolute deviations: \begin{equation} d_{1}(c)=\sum_{i=1}^{N}|\, x_{i}-c\,|\label{eq:res-1} \end{equation} If we assume that : $x_{1}\leq\, x_{2}\leq\, x_{3}\leq\,\ldots\,\leq\, x_{N}$ then \begin{equation} d_{1}(c)=\begin{cases} X-Nc ,\; c\leq x_{1}\\ X-2x_{1}-(N-2)c ,\: x_{1}\lt c\leq x_{2}\\ \cdots \cdots\\ -X+Nc ,\; x_{N} \lt c \end{cases}\label{eq:d1} \end{equation} where \[ X=\sum_{i=1}^{N}x_{i} \] Eq. (\ref{eq:d1}) is a continuous broken line (piecewise linear function). Its derivative is a piecewise constant function with jump discontinuities at $x_{i}$: \begin{equation} d_{1}^{\,\prime}(c)=\begin{cases} -N ,\; c\leq x_{1}\\ 2k-N ,\: x_{k-1}\lt c\leq x_{k}\\ N ,\; x_{N} \lt c \end{cases}\label{eq:dd1} \end{equation}

2.1. Exact solution

The slope of $d_{1}(c)$ changes from $-N$ to $+N$ by $+2$ at each $x_{i}$ . If $N$ is odd then for $c\lt x_{m}$ the slope is negative and for $c>x_{m}$ the slope is positive. Here $m=\frac{N+1}{2}$ labels the median value. Therefore $c=x_{m}$ is the unique solution. If $N$ is even then for $c\lt x_{m1}$ the slope is negative, for $c>x_{m2}$ the slope is positive and for $x_{m_{1}}\lt c\lt x_{m_{2}}$ the function $d_{1}(c)$ (\ref{eq:d1}) is flat. Here $m_{1}=\frac{N}{2}$ and $m_{2}=\frac{N}{2}+1$ . This uncertainty can be resolved again by assigning to $c$ the median value $c=\frac{1}{2}\left(x_{m_{1}}+x_{m_{2}}\right)$

2.2. Regularization

The $abs$ function can be approximated by a smooth function: \begin{equation} |\, z\,|\approx\varphi_{\varepsilon}(z)=\sqrt{\left(\varepsilon^{2}+z^{2}\right)}\label{eq:sabs} \end{equation} Substituting $abs$ with (\ref{eq:sabs}) in (\ref{eq:res-1}) we have \[ \tilde{d}{}_{1}(c)=\sum_{i=1}^{N}\varphi_{\varepsilon}\left(x_{i}-c\right) \] Now the derivative is a continuous function and a regular approach can be used \[ 0=-\frac{\partial}{\partial c}\tilde{d}{}_{1}(c)=\sum_{i=1}^{N}\left(x_{i}-c\right)\varphi_{\varepsilon}^{\,\prime}\left(x_{i}-c\right) \] Note that the regularization makes the solution unique.

3. Continuous distribution

The continuous analogue of (\ref{eq:res-1}) is \begin{equation} d_{1}(c)=\intop_{-\infty}^{\infty}\rho(x)\,|x-c|\, dx\label{eq:d1c} \end{equation} Here $\rho(x)$ is a density function [2]. It can be interpreted as a probability density if we normalize it with the total "mass" \[ N=\intop_{-\infty}^{\infty}\rho(x)\, dx \] By definition \[ d_{1}(c)=-\intop_{-\infty}^{c}\rho(x)\,(x-c)\, dx+\intop_{c}^{\infty}\rho(x)\,(x-c)\, dx \] For a bounded density function the derivative is \begin{eqnarray} 0 \; = \: \frac{\partial}{\partial c}d_{1}(c)\nonumber \\ \; \; = \: \intop_{-\infty}^{c}\rho(x)dx-\intop_{c}^{\infty}\rho(x)dx\label{eq:med} \end{eqnarray} Here it is assumed that $\underset{x\rightarrow c}{\lim}\rho(x)\,(x-c)=0$. Eq (\ref{eq:med}) shows that the optimal $c$ divides the distribution $\rho(x)$ into two parts with equal amounts: \begin{equation} \intop_{-\infty}^{c}\rho(x)dx=\intop_{c}^{\infty}\rho(x)dx\label{eq:eq} \end{equation} In probability theory this is the median of $\rho(x)$ [1].

4. Back to the discrete case

The initial problem (\ref{eq:res-1}) can be presented in the form (\ref{eq:d1c}) with the following density \begin{equation} \rho_{e}(x)=\sum_{i=1}^{N}\delta(x-x_{i})\label{eq:rhoe} \end{equation} Each delta function contributes 1 to the integrals in (\ref{eq:eq}) and the integrals are just the counts of points to the left and to the right from $c$. To satisfy the equality for even $N$ , $c$ should be placed somewhere between $x_{m_{1}}$and $x_{m_{2}}$. For odd $N$ the choice is unique: $c=x_{m}$ because the counts are equal zero contribution to (\ref{eq:d1c}) from $x_{m}$ : \[ \int\delta(x-x_{m})\,|x-x_{m}|\, dx=0 \]

Numerical experiments

The IPython HTML notebook experiments are available at the links below.

The IPython HTML notebooks can be:

viewed here .
downloaded here .

References

Median, Wikipedia.
Empirical density functions, N.V.Shokhirev, NomericalExpert.com 2013.