Empirical density functions
Nikolai Shokhirev
January 14, 2014
Introduction
Here we discuss two somewhat opposite approaches to estimation of probability density function from experimental data.
Empirical Density Functions
The outcome of $N$ measurements of $x$ yields the sequence $x_{i}\,,\, i=1,\ldots,\, N$ (Sample measurements).
Without any additional assumptions the probability density function (PDF) is
\begin{equation}
\rho_{e}(x)=\frac{1}{N}\sum_{i=1}^{N}\delta(x-x_{i})\label{eq:rhoe}
\end{equation}
Here $\delta(x)$ is the Dirac delta function [
1].
The density (\ref{eq:rhoe}) is also called a raw density function [
2].
The PDF (\ref{eq:rhoe}) is obviously normalized.
The corresponding cumulative probability function is
\begin{equation}
P_{e}(x)=\frac{1}{N}\sum_{i=1}^{N}H(x-x_{i})
\end{equation}
Here $H(x)$ is the Heaviside step function [
3].
For any function $f(x)$ this PDF gives the following average values
\begin{equation}
\left\langle \, f\,\right\rangle =\intop_{a}^{b}f(x)\rho_{e}(x)dx=\frac{1}{N}\sum_{i=N}^{N}f(x_{i})
\end{equation}
In particular, the moments are
\[
\left\langle x^{n}\right\rangle =\frac{1}{N}\sum_{i=N}^{N}x_{i}^{n}
\]
Remarks
- Obviously the function (\ref{eq:rhoe}) is not smooth, but the sample measurements do not give information about smoothness.
- Anything beyond this formula is based on some assumptions, theories or other experiments.
- Eq. (\ref{eq:rhoe}) is a real non-parametric estimation of the probability density functions.
Kernel density estimation
Kernel density estimator is
\begin{equation}
\rho_{h}(x)=\frac{1}{N}\sum_{i=1}^{N}K_{h}(x-x_{i})\label{eq:rhoh}
\end{equation}
Here
\[
K_{h}(x)=\frac{1}{h}K\left(\frac{x}{h}\right)
\]
is a symmetric function so that
\[
\int K(x)dx=1
\]
and $h$ is a smoothing parameter or bandwidth.
References
- The Dirac delta function.
- Kernel bandwidth optimization in spike rate estimation.
- The Heaviside step function.
- Density estimation.
- Histogram.
- A bandwidth selection for kernel density estimation of functions of randomv ariables,
A.R Mugdadia, Ibrahim A Ahmadb. Computational Statistics & Data Analysis, v. 47, 2004, 49-62