Empirical density functions

Nikolai Shokhirev

January 14, 2014


Here we discuss two somewhat opposite approaches to estimation of probability density function from experimental data.

Empirical Density Functions

The outcome of $N$ measurements of $x$ yields the sequence $x_{i}\,,\, i=1,\ldots,\, N$ (Sample measurements). Without any additional assumptions the probability density function (PDF) is \begin{equation} \rho_{e}(x)=\frac{1}{N}\sum_{i=1}^{N}\delta(x-x_{i})\label{eq:rhoe} \end{equation} Here $\delta(x)$ is the Dirac delta function [1]. The density (\ref{eq:rhoe}) is also called a raw density function [2]. The PDF (\ref{eq:rhoe}) is obviously normalized.

The corresponding cumulative probability function is

\begin{equation} P_{e}(x)=\frac{1}{N}\sum_{i=1}^{N}H(x-x_{i}) \end{equation} Here $H(x)$ is the Heaviside step function [3]. For any function $f(x)$ this PDF gives the following average values \begin{equation} \left\langle \, f\,\right\rangle =\intop_{a}^{b}f(x)\rho_{e}(x)dx=\frac{1}{N}\sum_{i=N}^{N}f(x_{i}) \end{equation} In particular, the moments are \[ \left\langle x^{n}\right\rangle =\frac{1}{N}\sum_{i=N}^{N}x_{i}^{n} \]


Kernel density estimation

Kernel density estimator is \begin{equation} \rho_{h}(x)=\frac{1}{N}\sum_{i=1}^{N}K_{h}(x-x_{i})\label{eq:rhoh} \end{equation} Here \[ K_{h}(x)=\frac{1}{h}K\left(\frac{x}{h}\right) \] is a symmetric function so that \[ \int K(x)dx=1 \] and $h$ is a smoothing parameter or bandwidth.


  1. The Dirac delta function.
  2. Kernel bandwidth optimization in spike rate estimation.
  3. The Heaviside step function.
  4. Density estimation.
  5. Histogram.
  6. A bandwidth selection for kernel density estimation of functions of randomv ariables, A.R Mugdadia, Ibrahim A Ahmadb. Computational Statistics & Data Analysis, v. 47, 2004, 49-62

© Nikolai Shokhirev, 2012-2017

email: nikolai(dot)shokhirev(at)gmail(dot)com