# Histogram function

## Introduction

The histogram functions [1] are widely used in probability density function (PDF) estimation [2].

## Definitions

A histogram is a piecewise constant function: $$f\left(x,\vec{p}\right)=\sum_{m=0}^{M-1}p_{m}\Pi_{m}(x)\,,\: a\leq x\leq b\label{eq:hist_def}$$ Here $$\Pi_{m}(x)=\Pi^{(h)}(x-mh)$$ where $$\Pi^{(h)}(x)=\begin{cases} 0 , & x \lt 0 \\ 1 , & 0 \le x \le h \\ 0 , & h \le x \end{cases}$$ is a rectangular functions of width (bin size) $h$ and $$h=\frac{b-a}{M}$$ The normalization condition $$\mathcal{N}(\vec{p})=\intop_{a}^{b}f\left(x,\vec{p}\right)dx=1\label{eq:norm-cond}$$ reduces to $$\mathcal{N}(\vec{p})=h\sum_{m=0}^{M-1}p_{m}=1\label{eq:norm}$$ It is also required that $p_{m}\geq0$ for all $m$ if (\ref{eq:hist_def}) represent a probability density function.

## Properties

Note that (\ref{eq:hist_def}) is a smooth function of $p_{m}$ and the derivatives are $$\frac{\partial}{\partial p_{m}}f\left(x,\vec{p}\right)=\Pi_{m}(x)\label{eq:hist_deriv}$$ This property is used in PDF fitting. Integration of $\Pi_{m}$ with any function gives its average value over the $m$-th interval: $$\intop_{a}^{b}\Pi_{m}(x)y(x)dx=\intop_{mh}^{(m+1)h}y(x)dx=h\left\langle \, y\,\right\rangle _{m}\label{eq:avg}$$ The $\Pi$ functions are orthogonal: $$\Pi_{m}(x)\Pi_{k}(x)=\delta_{m,k}\Pi_{m}(x)\label{eq:prod}$$ and $$\intop_{a}^{b}\Pi_{m}(x)\Pi_{k}(x)dx=h\delta_{m,k}\label{eq:ort}$$

## Generalization

In Eq. (\ref{eq:hist_def}) we can relax the requirement of an equal width of all$\Pi$-functions. The definition (\ref{eq:Pi-func}) is replaced with $$\Pi_{m}(x)=\begin{cases} 0, & x \lt x_{m}\\ 1, & x_{m}\leq x \lt x_{m+1}\\ 0, & x_{m+1}\leq x \end{cases}$$ Here $$a=x_{0}\lt x_{1} \lt \cdots \lt x_{M-1}\lt x_{M}=b$$ and the widths are $$h_{m}=x_{m+1}-x_{m}$$ The equations (\ref{eq:hist_def}), (\ref{eq:hist_deriv}) and (\ref{eq:prod}) remain unchanged. Eq. (\ref{eq:norm}) reduces to $$\mathcal{N}(\vec{p})=\sum_{m=0}^{M-1}h_{m\,}p_{m}=1\label{eq:norm-1}$$ In Eqs. (\ref{eq:avg}) and (\ref{eq:ort}) $h$ should be replaced with $h_{m}$ .

## Histogram Bin-width Optimization

See the links below.

### References

© Nikolai Shokhirev, 2012-2017

email: nikolai(dot)shokhirev(at)gmail(dot)com

Count: