Probability density function fitting

Nikolai Shokhirev

December 18, 2012

1. Introduction

Suppose there is a continuous random variable $x$. Probability density function (PDF) estimation, $\rho(x)$, is a common problem [1]: \begin{equation} \rho(x)\approx f(x,\vec{p})\label{eq:estim} \end{equation} Here $f(x,\vec{p})$ is the fitting function where $\vec{p}$ is the vector of parameters. There are two main approaches:

Density estimation can be treated as a general fitting problem.
Another approach is based on reproducing of statistical properties, mean variance and all other moments.

Here we review the first approach.

2. PDF properties

Any PDF $\rho(x)$ defined on interval $[a,b]$ satisfies the following properties

It is non-negative for every x in $[a,b]$ \begin{equation} \rho(x)\geq0,\: x\in[a,b]\label{eq:non-neg} \end{equation}
It is normalized \begin{equation} \intop_{a}^{b}\rho(x)dx=1\label{eq:norm-cond} \end{equation}.

Obviously the fitting function must satisfy the above properties. Note that one or both interval limits can be infinite.

3. Preliminary remarks

The method of least squares reduces a fitting problem to minimization. For the problem (\ref{eq:estim}) The objective function $F(\vec{p})$ is the integrated squared error \begin{equation} F(\vec{p})=\intop_{a}^{b}\left[f\left(x,\vec{p}\right)-\rho\left(x\right)\right]^{2}dx\rightarrow min\label{eq:to min} \end{equation} Equivalently \begin{equation} \intop_{a}^{b}\left[f\left(x,\vec{p}\right)^{2}-2f\left(x,\vec{p}\right)\rho\left(x\right)+\rho\left(x\right)^{2}\right]dx\rightarrow min\label{eq:to min-2} \end{equation} Note that \begin{equation} \intop_{a}^{b}\rho\left(x\right)^{2}dx=const\label{eq:to min-3} \end{equation} Therefore Eq.(\ref{eq:to min}) is equivalent to \begin{equation} \intop_{a}^{b}\left[f\left(x,\vec{p}\right)^{2}-2f\left(x,\vec{p}\right)\rho\left(x\right)\right]dx\rightarrow min\label{eq:to min-4} \end{equation} which is easier to use.

4. Least Squares Fitting

We need to minimize the objective function (\ref{eq:to min}). There are two variants of the least squares fitting. If the fitting function automatically satisfies the condition (\ref{eq:norm-cond}) then one should minimize the objective function (\ref{eq:to min}) or (\ref{eq:to min-4}) directly. Otherwise the condition (\ref{eq:norm-cond}) should be included into the objective function with the Lagrange multiplier $\lambda$ [2].

4.1. Least squares fitting with constraint

The objective function with the constraint (\ref{eq:norm-cond}) is \begin{equation} F(\vec{p)}=\intop_{a}^{b}\left[f(x,\vec{p})-\rho(x)\right]^{2}dx-2\lambda\left[\intop_{a}^{b}f\left(x,\vec{p}\right)dx-1\right]\label{eq:obj-func-1} \end{equation} If the objective function is a smooth function of parameters (which is usually the case), the minimization reduces to \begin{equation} \frac{\partial}{\partial p_{i}}F(\vec{p})=0 \end{equation} or in a vector form \begin{equation} \vec{\nabla}F\left(\vec{p}\right)=\vec{0} \end{equation} Applying this to (\ref{eq:obj-func-1}) either in the form (\ref{eq:to min}) or (\ref{eq:to min-4}) we get \begin{equation} \vec{0}=\intop_{a}^{b}f\left(x,\vec{p}\right)\vec{g}\left(x,\vec{p}\right)f\left(x,\vec{p}\right)dx-\intop_{a}^{b}\rho\left(x\right)\vec{g}\left(x,\vec{p}\right)f\left(x,\vec{p}\right)dx-\lambda\intop_{a}^{b}\vec{g}\left(x,\vec{p}\right)f\left(x,\vec{p}\right)dx\label{eq:with_constr} \end{equation} where \begin{equation} \vec{g}\left(x,\vec{p}\right)=\vec{\nabla}f\left(x,\vec{p}\right)\label{eq:grad} \end{equation}

4.2. Least squares fitting without constraint

The equations for this case can be obtained from (\ref{eq:with_constr}) by setting $\lambda = 0$.

In both cases it is necessary to solve the systems of equations. In the case of Eq. (\ref{eq:with_constr}) it also necessary to satisfy the normalization condition (\ref{eq:norm-cond}) and exclude $\lambda$ from the equations.

5. Examples

5.1. Histogram

Let us consider the histogram function [3] as a fitting function. \begin{equation} f\left(x,\vec{p}\right)=\sum_{m=0}^{M-1}p_{m}\Pi_{m}(x) \end{equation} (see [4] for definitions).

For an empirical density function $\rho_{e}(x)$ (see Eq. (1) in [5])

\[ \intop_{a}^{b}\Pi_{m}(x)\rho_{e}(x)dx=\intop_{mh}^{(m+1)h}\frac{1}{N}\sum_{i=1}^{N}\delta(x-x_{i})dx=\frac{n_{m}}{N} \] Here $n_{m}$ is the number of $x_{i}$between $a+mh$ and $a+(m+1)h$. Obviously \[ \sum_{m=0}^{M-1}n_{m}=N \] The Eq. (\ref{eq:with_constr}) becomes \[ 0=p_{m}h-\frac{n_{m}}{N}-\lambda h \] which gives \begin{equation} p_{m}=\frac{n_{m}}{Nh}+\lambda\label{eq:pm} \end{equation} The normalization condition (\ref{eq:norm-cond}) is \[ 1=h\sum_{m=0}^{M-1}p_{m}=\sum_{m=0}^{M-1}\left(\frac{n_{m}}{N}-\lambda h\right)=1-\lambda hN \] Therefore $\lambda=0$ in (\ref{eq:pm}).

Constrained vs. unconstrained fitting

Why do we need the Lagrange multipliers if we always can convert the constraint optimization into unconstraint one? Indeed, for any function $f(x,\vec{p})$ we can define \begin{equation} \varphi(x,\vec{p})=\frac{f(x,\vec{p})}{\mathcal{N}(\vec{p})}\label{eq:unconstraint} \end{equation} where \[ \mathcal{N}(\vec{p})=\intop_{a}^{b}f(x,\vec{p})dx \] The function $\varphi(x,\vec{p})$ is normalized and we can use the least squares fitting without constraint. However, the gradient in (\ref{eq:with_constr}) now is \begin{equation} \vec{g}\left(x,\vec{p}\right)=\frac{\vec{\nabla}f\left(x,\vec{p}\right)}{\mathcal{N}(\vec{p})}-\frac{f(x,\vec{p})}{\left[\mathcal{N}(\vec{p})\right]^{2}}\vec{\nabla}N\left(\vec{p}\right) \end{equation}and the calculations are usually more complex.

References

Density estimation.
Lagrange multiplier.
Histogram. - Wikipedia.
Histogram: definitions and properties.
Empirical density functions
Density Estimation For Statistics And Data Analysis by B.W. Silverman
Nonparametric Estimation of Probability Density Functions
A bandwidth selection for kernel density estimation of functions of random variables, A.R Mugdadia, Ibrahim A Ahmadb. Computational Statistics & Data Analysis, v. 47, 2004, 49-62