Home

Articles
Tutorials

Empirical density functions

Nikolai Shokhirev

January 14, 2014

Introduction

Here we discuss two somewhat opposite approaches to estimation of probability density function from experimental data.

Empirical Density Functions

The outcome of $N$ measurements of $x$ yields the sequence $x_{i}\,,\, i=1,\ldots,\, N$ (Sample measurements). Without any additional assumptions the probability density function (PDF) is \begin{equation} \rho_{e}(x)=\frac{1}{N}\sum_{i=1}^{N}\delta(x-x_{i})\label{eq:rhoe} \end{equation} Here $\delta(x)$ is the Dirac delta function [1]. The density (\ref{eq:rhoe}) is also called a raw density function [2]. The PDF (\ref{eq:rhoe}) is obviously normalized.

The corresponding cumulative probability function is

\begin{equation} P_{e}(x)=\frac{1}{N}\sum_{i=1}^{N}H(x-x_{i}) \end{equation} Here $H(x)$ is the Heaviside step function [3]. For any function $f(x)$ this PDF gives the following average values \begin{equation} \left\langle \, f\,\right\rangle =\intop_{a}^{b}f(x)\rho_{e}(x)dx=\frac{1}{N}\sum_{i=N}^{N}f(x_{i}) \end{equation} In particular, the moments are \[ \left\langle x^{n}\right\rangle =\frac{1}{N}\sum_{i=N}^{N}x_{i}^{n} \]

Remarks

Kernel density estimation

Kernel density estimator is \begin{equation} \rho_{h}(x)=\frac{1}{N}\sum_{i=1}^{N}K_{h}(x-x_{i})\label{eq:rhoh} \end{equation} Here \[ K_{h}(x)=\frac{1}{h}K\left(\frac{x}{h}\right) \] is a symmetric function so that \[ \int K(x)dx=1 \] and $h$ is a smoothing parameter or bandwidth.

References

  1. The Dirac delta function.
  2. Kernel bandwidth optimization in spike rate estimation.
  3. The Heaviside step function.
  4. Density estimation.
  5. Histogram.
  6. A bandwidth selection for kernel density estimation of functions of randomv ariables, A.R Mugdadia, Ibrahim A Ahmadb. Computational Statistics & Data Analysis, v. 47, 2004, 49-62


© Nikolai Shokhirev, 2012-2017

email: nikolai(dot)shokhirev(at)gmail(dot)com

Count: