# Empirical density functions

## Introduction

Here we discuss two somewhat opposite approaches to estimation of probability density function from experimental data.

## Empirical Density Functions

The outcome of $N$ measurements of $x$ yields the sequence $x_{i}\,,\, i=1,\ldots,\, N$ (Sample measurements). Without any additional assumptions the probability density function (PDF) is $$\rho_{e}(x)=\frac{1}{N}\sum_{i=1}^{N}\delta(x-x_{i})\label{eq:rhoe}$$ Here $\delta(x)$ is the Dirac delta function [1]. The density (\ref{eq:rhoe}) is also called a raw density function [2]. The PDF (\ref{eq:rhoe}) is obviously normalized.

The corresponding cumulative probability function is

$$P_{e}(x)=\frac{1}{N}\sum_{i=1}^{N}H(x-x_{i})$$ Here $H(x)$ is the Heaviside step function [3]. For any function $f(x)$ this PDF gives the following average values $$\left\langle \, f\,\right\rangle =\intop_{a}^{b}f(x)\rho_{e}(x)dx=\frac{1}{N}\sum_{i=N}^{N}f(x_{i})$$ In particular, the moments are $\left\langle x^{n}\right\rangle =\frac{1}{N}\sum_{i=N}^{N}x_{i}^{n}$

## Remarks

• Obviously the function (\ref{eq:rhoe}) is not smooth, but the sample measurements do not give information about smoothness.
• Anything beyond this formula is based on some assumptions, theories or other experiments.
• Eq. (\ref{eq:rhoe}) is a real non-parametric estimation of the probability density functions.

## Kernel density estimation

Kernel density estimator is $$\rho_{h}(x)=\frac{1}{N}\sum_{i=1}^{N}K_{h}(x-x_{i})\label{eq:rhoh}$$ Here $K_{h}(x)=\frac{1}{h}K\left(\frac{x}{h}\right)$ is a symmetric function so that $\int K(x)dx=1$ and $h$ is a smoothing parameter or bandwidth.

### References

1. The Dirac delta function.
2. Kernel bandwidth optimization in spike rate estimation.
3. The Heaviside step function.
4. Density estimation.
5. Histogram.
6. A bandwidth selection for kernel density estimation of functions of randomv ariables, A.R Mugdadia, Ibrahim A Ahmadb. Computational Statistics & Data Analysis, v. 47, 2004, 49-62