Numerical experiments, Tips, Tricks and Gotchas

## Least absolute deviations

### 1. Introduction

Let us start with the least squares problem. Given $N$ data points: $x_{1},\, x_{2},\, x_{3},\,\ldots\,,\, x_{N}$ . The problem is to find the value $c$, which minimizes the sum of squared deviations $$d_{2}(c)=\sum_{i=1}^{N}\left(x_{i}-c\right)^{2}\label{eq:res}$$ The function $d_{2}(c)$ is a smooth function of $c$ and the problem is solved by equating the derivative to zero $$0=-\frac{\partial}{\partial c}d_{2}(c)=\sum_{i=1}^{N}\left(x_{i}-c\right)$$ or $$c=\frac{1}{N}\sum_{i=1}^{N}x_{i}$$ The optimal value is just the arithmetic mean of $x_{i}$.

### 2. Least absolute deviations

A similar problem is to minimize the absolute deviations: $$d_{1}(c)=\sum_{i=1}^{N}|\, x_{i}-c\,|\label{eq:res-1}$$ If we assume that : $x_{1}\leq\, x_{2}\leq\, x_{3}\leq\,\ldots\,\leq\, x_{N}$ then $$d_{1}(c)=\begin{cases} X-Nc ,\; c\leq x_{1}\\ X-2x_{1}-(N-2)c ,\: x_{1}\lt c\leq x_{2}\\ \cdots \cdots\\ -X+Nc ,\; x_{N} \lt c \end{cases}\label{eq:d1}$$ where $X=\sum_{i=1}^{N}x_{i}$ Eq. (\ref{eq:d1}) is a continuous broken line (piecewise linear function). Its derivative is a piecewise constant function with jump discontinuities at $x_{i}$: $$d_{1}^{\,\prime}(c)=\begin{cases} -N ,\; c\leq x_{1}\\ 2k-N ,\: x_{k-1}\lt c\leq x_{k}\\ N ,\; x_{N} \lt c \end{cases}\label{eq:dd1}$$

#### 2.1. Exact solution

The slope of $d_{1}(c)$ changes from $-N$ to $+N$ by $+2$ at each $x_{i}$ . If $N$ is odd then for $c\lt x_{m}$ the slope is negative and for $c>x_{m}$ the slope is positive. Here $m=\frac{N+1}{2}$ labels the median value. Therefore $c=x_{m}$ is the unique solution. If $N$ is even then for $c\lt x_{m1}$ the slope is negative, for $c>x_{m2}$ the slope is positive and for $x_{m_{1}}\lt c\lt x_{m_{2}}$ the function $d_{1}(c)$ (\ref{eq:d1}) is flat. Here $m_{1}=\frac{N}{2}$ and $m_{2}=\frac{N}{2}+1$ . This uncertainty can be resolved again by assigning to $c$ the median value $c=\frac{1}{2}\left(x_{m_{1}}+x_{m_{2}}\right)$

#### 2.2. Regularization

The $abs$ function can be approximated by a smooth function: $$|\, z\,|\approx\varphi_{\varepsilon}(z)=\sqrt{\left(\varepsilon^{2}+z^{2}\right)}\label{eq:sabs}$$ Substituting $abs$ with (\ref{eq:sabs}) in (\ref{eq:res-1}) we have $\tilde{d}{}_{1}(c)=\sum_{i=1}^{N}\varphi_{\varepsilon}\left(x_{i}-c\right)$ Now the derivative is a continuous function and a regular approach can be used $0=-\frac{\partial}{\partial c}\tilde{d}{}_{1}(c)=\sum_{i=1}^{N}\left(x_{i}-c\right)\varphi_{\varepsilon}^{\,\prime}\left(x_{i}-c\right)$ Note that the regularization makes the solution unique.

### 3. Continuous distribution

The continuous analogue of (\ref{eq:res-1}) is $$d_{1}(c)=\intop_{-\infty}^{\infty}\rho(x)\,|x-c|\, dx\label{eq:d1c}$$ Here $\rho(x)$ is a density function [2]. It can be interpreted as a probability density if we normalize it with the total "mass" $N=\intop_{-\infty}^{\infty}\rho(x)\, dx$ By definition $d_{1}(c)=-\intop_{-\infty}^{c}\rho(x)\,(x-c)\, dx+\intop_{c}^{\infty}\rho(x)\,(x-c)\, dx$ For a bounded density function the derivative is \begin{eqnarray} 0 \; = \: \frac{\partial}{\partial c}d_{1}(c)\nonumber \\ \; \; = \: \intop_{-\infty}^{c}\rho(x)dx-\intop_{c}^{\infty}\rho(x)dx\label{eq:med} \end{eqnarray} Here it is assumed that $\underset{x\rightarrow c}{\lim}\rho(x)\,(x-c)=0$. Eq (\ref{eq:med}) shows that the optimal $c$ divides the distribution $\rho(x)$ into two parts with equal amounts: $$\intop_{-\infty}^{c}\rho(x)dx=\intop_{c}^{\infty}\rho(x)dx\label{eq:eq}$$ In probability theory this is the median of $\rho(x)$ [1].

### 4. Back to the discrete case

The initial problem (\ref{eq:res-1}) can be presented in the form (\ref{eq:d1c}) with the following density $$\rho_{e}(x)=\sum_{i=1}^{N}\delta(x-x_{i})\label{eq:rhoe}$$ Each delta function contributes 1 to the integrals in (\ref{eq:eq}) and the integrals are just the counts of points to the left and to the right from $c$. To satisfy the equality for even $N$ , $c$ should be placed somewhere between $x_{m_{1}}$and $x_{m_{2}}$. For odd $N$ the choice is unique: $c=x_{m}$ because the counts are equal zero contribution to (\ref{eq:d1c}) from $x_{m}$ : $\int\delta(x-x_{m})\,|x-x_{m}|\, dx=0$

### Numerical experiments

The IPython HTML notebook experiments are available at the links below.

The IPython HTML notebooks can be:

### References

1. Median, Wikipedia.
2. Empirical density functions, N.V.Shokhirev, NomericalExpert.com 2013.