Probability density function fitting
Nikolai Shokhirev
December 18, 2012
1. Introduction
Suppose there is a continuous random variable $x$. Probability density function (PDF) estimation, $\rho(x)$,
is a common problem [
1]:
\begin{equation}
\rho(x)\approx f(x,\vec{p})\label{eq:estim}
\end{equation}
Here $f(x,\vec{p})$ is the fitting function where $\vec{p}$ is the vector of parameters. There are two main approaches:
- Density estimation can be treated as a general fitting problem.
- Another approach is based on reproducing of statistical properties, mean variance and all other moments.
Here we review the first approach.
2. PDF properties
Any PDF $\rho(x)$ defined on interval $[a,b]$ satisfies the following properties
- It is non-negative for every x in $[a,b]$
\begin{equation}
\rho(x)\geq0,\: x\in[a,b]\label{eq:non-neg}
\end{equation}
- It is normalized
\begin{equation}
\intop_{a}^{b}\rho(x)dx=1\label{eq:norm-cond}
\end{equation}.
Obviously the fitting function must satisfy the above properties. Note that one or both interval limits can be infinite.
3. Preliminary remarks
The method of least squares reduces a fitting problem to minimization. For the problem (\ref{eq:estim}) The objective function $F(\vec{p})$
is the integrated squared error
\begin{equation}
F(\vec{p})=\intop_{a}^{b}\left[f\left(x,\vec{p}\right)-\rho\left(x\right)\right]^{2}dx\rightarrow min\label{eq:to min}
\end{equation}
Equivalently
\begin{equation}
\intop_{a}^{b}\left[f\left(x,\vec{p}\right)^{2}-2f\left(x,\vec{p}\right)\rho\left(x\right)+\rho\left(x\right)^{2}\right]dx\rightarrow min\label{eq:to min-2}
\end{equation}
Note that
\begin{equation}
\intop_{a}^{b}\rho\left(x\right)^{2}dx=const\label{eq:to min-3}
\end{equation}
Therefore Eq.(\ref{eq:to min}) is equivalent to
\begin{equation}
\intop_{a}^{b}\left[f\left(x,\vec{p}\right)^{2}-2f\left(x,\vec{p}\right)\rho\left(x\right)\right]dx\rightarrow min\label{eq:to min-4}
\end{equation}
which is easier to use.
4. Least Squares Fitting
We need to minimize the objective function (\ref{eq:to min}). There are two variants of the least squares fitting. If the fitting function
automatically satisfies the condition (\ref{eq:norm-cond}) then one should minimize the objective function (\ref{eq:to min}) or (\ref{eq:to min-4})
directly. Otherwise the condition (\ref{eq:norm-cond}) should be included into the objective function with the Lagrange multiplier
$\lambda$ [
2].
4.1. Least squares fitting with constraint
The objective function with the constraint (\ref{eq:norm-cond}) is
\begin{equation}
F(\vec{p)}=\intop_{a}^{b}\left[f(x,\vec{p})-\rho(x)\right]^{2}dx-2\lambda\left[\intop_{a}^{b}f\left(x,\vec{p}\right)dx-1\right]\label{eq:obj-func-1}
\end{equation}
If the objective function is a smooth function of parameters (which is usually the case), the minimization reduces to
\begin{equation}
\frac{\partial}{\partial p_{i}}F(\vec{p})=0
\end{equation}
or in a vector form
\begin{equation}
\vec{\nabla}F\left(\vec{p}\right)=\vec{0}
\end{equation}
Applying this to (\ref{eq:obj-func-1}) either in the form (\ref{eq:to min}) or (\ref{eq:to min-4}) we get
\begin{equation}
\vec{0}=\intop_{a}^{b}f\left(x,\vec{p}\right)\vec{g}\left(x,\vec{p}\right)f\left(x,\vec{p}\right)dx-\intop_{a}^{b}\rho\left(x\right)\vec{g}\left(x,\vec{p}\right)f\left(x,\vec{p}\right)dx-\lambda\intop_{a}^{b}\vec{g}\left(x,\vec{p}\right)f\left(x,\vec{p}\right)dx\label{eq:with_constr}
\end{equation}
where
\begin{equation}
\vec{g}\left(x,\vec{p}\right)=\vec{\nabla}f\left(x,\vec{p}\right)\label{eq:grad}
\end{equation}
4.2. Least squares fitting without constraint
The equations for this case can be obtained from (\ref{eq:with_constr}) by setting $\lambda = 0$.
In both cases it is necessary to solve the systems of equations. In the case of Eq. (\ref{eq:with_constr}) it also necessary to satisfy the normalization condition
(\ref{eq:norm-cond}) and exclude $\lambda$ from the equations.
5. Examples
5.1. Histogram
Let us consider the histogram function [
3] as a fitting function.
\begin{equation}
f\left(x,\vec{p}\right)=\sum_{m=0}^{M-1}p_{m}\Pi_{m}(x)
\end{equation}
(see [
4] for definitions).
For an empirical density function $\rho_{e}(x)$ (see Eq. (1) in [5])
\[
\intop_{a}^{b}\Pi_{m}(x)\rho_{e}(x)dx=\intop_{mh}^{(m+1)h}\frac{1}{N}\sum_{i=1}^{N}\delta(x-x_{i})dx=\frac{n_{m}}{N}
\]
Here $n_{m}$ is the number of $x_{i}$between $a+mh$ and $a+(m+1)h$.
Obviously
\[
\sum_{m=0}^{M-1}n_{m}=N
\]
The Eq. (\ref{eq:with_constr}) becomes
\[
0=p_{m}h-\frac{n_{m}}{N}-\lambda h
\]
which gives
\begin{equation}
p_{m}=\frac{n_{m}}{Nh}+\lambda\label{eq:pm}
\end{equation}
The normalization condition (\ref{eq:norm-cond}) is
\[
1=h\sum_{m=0}^{M-1}p_{m}=\sum_{m=0}^{M-1}\left(\frac{n_{m}}{N}-\lambda h\right)=1-\lambda hN
\]
Therefore $\lambda=0$ in (\ref{eq:pm}).
Constrained vs. unconstrained fitting
Why do we need the Lagrange multipliers if we always can convert the constraint optimization into unconstraint one?
Indeed, for any function
$f(x,\vec{p})$ we can define
\begin{equation}
\varphi(x,\vec{p})=\frac{f(x,\vec{p})}{\mathcal{N}(\vec{p})}\label{eq:unconstraint}
\end{equation}
where
\[
\mathcal{N}(\vec{p})=\intop_{a}^{b}f(x,\vec{p})dx
\]
The function $\varphi(x,\vec{p})$ is normalized and we can use the least squares fitting without constraint.
However, the gradient in (\ref{eq:with_constr}) now is
\begin{equation}
\vec{g}\left(x,\vec{p}\right)=\frac{\vec{\nabla}f\left(x,\vec{p}\right)}{\mathcal{N}(\vec{p})}-\frac{f(x,\vec{p})}{\left[\mathcal{N}(\vec{p})\right]^{2}}\vec{\nabla}N\left(\vec{p}\right)
\end{equation}and the calculations are usually more complex.
References
- Density estimation.
- Lagrange multiplier.
- Histogram. - Wikipedia.
- Histogram: definitions and properties.
- Empirical density functions
- Density Estimation For Statistics And Data Analysis by B.W. Silverman
- Nonparametric Estimation of Probability Density Functions
- A bandwidth selection for kernel density estimation of functions of random variables,
A.R Mugdadia, Ibrahim A Ahmadb. Computational Statistics & Data Analysis, v. 47, 2004, 49-62