Probability density function fitting

1. Introduction

Suppose there is a continuous random variable $x$. Probability density function (PDF) estimation, $\rho(x)$, is a common problem [1]: $$\rho(x)\approx f(x,\vec{p})\label{eq:estim}$$ Here $f(x,\vec{p})$ is the fitting function where $\vec{p}$ is the vector of parameters. There are two main approaches:
1. Density estimation can be treated as a general fitting problem.
2. Another approach is based on reproducing of statistical properties, mean variance and all other moments.
Here we review the first approach.

2. PDF properties

Any PDF $\rho(x)$ defined on interval $[a,b]$ satisfies the following properties
1. It is non-negative for every x in $[a,b]$ $$\rho(x)\geq0,\: x\in[a,b]\label{eq:non-neg}$$
2. It is normalized $$\intop_{a}^{b}\rho(x)dx=1\label{eq:norm-cond}$$.
Obviously the fitting function must satisfy the above properties. Note that one or both interval limits can be infinite.

3. Preliminary remarks

The method of least squares reduces a fitting problem to minimization. For the problem (\ref{eq:estim}) The objective function $F(\vec{p})$ is the integrated squared error $$F(\vec{p})=\intop_{a}^{b}\left[f\left(x,\vec{p}\right)-\rho\left(x\right)\right]^{2}dx\rightarrow min\label{eq:to min}$$ Equivalently $$\intop_{a}^{b}\left[f\left(x,\vec{p}\right)^{2}-2f\left(x,\vec{p}\right)\rho\left(x\right)+\rho\left(x\right)^{2}\right]dx\rightarrow min\label{eq:to min-2}$$ Note that $$\intop_{a}^{b}\rho\left(x\right)^{2}dx=const\label{eq:to min-3}$$ Therefore Eq.(\ref{eq:to min}) is equivalent to $$\intop_{a}^{b}\left[f\left(x,\vec{p}\right)^{2}-2f\left(x,\vec{p}\right)\rho\left(x\right)\right]dx\rightarrow min\label{eq:to min-4}$$ which is easier to use.

4. Least Squares Fitting

We need to minimize the objective function (\ref{eq:to min}). There are two variants of the least squares fitting. If the fitting function automatically satisfies the condition (\ref{eq:norm-cond}) then one should minimize the objective function (\ref{eq:to min}) or (\ref{eq:to min-4}) directly. Otherwise the condition (\ref{eq:norm-cond}) should be included into the objective function with the Lagrange multiplier $\lambda$ [2].

4.1. Least squares fitting with constraint

The objective function with the constraint (\ref{eq:norm-cond}) is $$F(\vec{p)}=\intop_{a}^{b}\left[f(x,\vec{p})-\rho(x)\right]^{2}dx-2\lambda\left[\intop_{a}^{b}f\left(x,\vec{p}\right)dx-1\right]\label{eq:obj-func-1}$$ If the objective function is a smooth function of parameters (which is usually the case), the minimization reduces to $$\frac{\partial}{\partial p_{i}}F(\vec{p})=0$$ or in a vector form $$\vec{\nabla}F\left(\vec{p}\right)=\vec{0}$$ Applying this to (\ref{eq:obj-func-1}) either in the form (\ref{eq:to min}) or (\ref{eq:to min-4}) we get $$\vec{0}=\intop_{a}^{b}f\left(x,\vec{p}\right)\vec{g}\left(x,\vec{p}\right)f\left(x,\vec{p}\right)dx-\intop_{a}^{b}\rho\left(x\right)\vec{g}\left(x,\vec{p}\right)f\left(x,\vec{p}\right)dx-\lambda\intop_{a}^{b}\vec{g}\left(x,\vec{p}\right)f\left(x,\vec{p}\right)dx\label{eq:with_constr}$$ where $$\vec{g}\left(x,\vec{p}\right)=\vec{\nabla}f\left(x,\vec{p}\right)\label{eq:grad}$$

4.2. Least squares fitting without constraint

The equations for this case can be obtained from (\ref{eq:with_constr}) by setting $\lambda = 0$.

In both cases it is necessary to solve the systems of equations. In the case of Eq. (\ref{eq:with_constr}) it also necessary to satisfy the normalization condition (\ref{eq:norm-cond}) and exclude $\lambda$ from the equations.

5. Examples

5.1. Histogram

Let us consider the histogram function [3] as a fitting function. $$f\left(x,\vec{p}\right)=\sum_{m=0}^{M-1}p_{m}\Pi_{m}(x)$$ (see [4] for definitions).

For an empirical density function $\rho_{e}(x)$ (see Eq. (1) in [5])

$\intop_{a}^{b}\Pi_{m}(x)\rho_{e}(x)dx=\intop_{mh}^{(m+1)h}\frac{1}{N}\sum_{i=1}^{N}\delta(x-x_{i})dx=\frac{n_{m}}{N}$ Here $n_{m}$ is the number of $x_{i}$between $a+mh$ and $a+(m+1)h$. Obviously $\sum_{m=0}^{M-1}n_{m}=N$ The Eq. (\ref{eq:with_constr}) becomes $0=p_{m}h-\frac{n_{m}}{N}-\lambda h$ which gives $$p_{m}=\frac{n_{m}}{Nh}+\lambda\label{eq:pm}$$ The normalization condition (\ref{eq:norm-cond}) is $1=h\sum_{m=0}^{M-1}p_{m}=\sum_{m=0}^{M-1}\left(\frac{n_{m}}{N}-\lambda h\right)=1-\lambda hN$ Therefore $\lambda=0$ in (\ref{eq:pm}).

Constrained vs. unconstrained fitting

Why do we need the Lagrange multipliers if we always can convert the constraint optimization into unconstraint one? Indeed, for any function $f(x,\vec{p})$ we can define $$\varphi(x,\vec{p})=\frac{f(x,\vec{p})}{\mathcal{N}(\vec{p})}\label{eq:unconstraint}$$ where $\mathcal{N}(\vec{p})=\intop_{a}^{b}f(x,\vec{p})dx$ The function $\varphi(x,\vec{p})$ is normalized and we can use the least squares fitting without constraint. However, the gradient in (\ref{eq:with_constr}) now is $$\vec{g}\left(x,\vec{p}\right)=\frac{\vec{\nabla}f\left(x,\vec{p}\right)}{\mathcal{N}(\vec{p})}-\frac{f(x,\vec{p})}{\left[\mathcal{N}(\vec{p})\right]^{2}}\vec{\nabla}N\left(\vec{p}\right)$$and the calculations are usually more complex.

References

1. Density estimation.
2. Lagrange multiplier.
3. Histogram. - Wikipedia.
4. Histogram: definitions and properties.
5. Empirical density functions
6. Density Estimation For Statistics And Data Analysis by B.W. Silverman
7. Nonparametric Estimation of Probability Density Functions
8. A bandwidth selection for kernel density estimation of functions of random variables, A.R Mugdadia, Ibrahim A Ahmadb. Computational Statistics & Data Analysis, v. 47, 2004, 49-62