Multivariate correlations

Nikolai Shokhirev

- Definitions
- Distributions
- Multivar correlations
- Principal component analysis

Problem statement

In science and technology systems (objects) are characterized by a finite set of parameters: xii = 1, ... , N. Consequently, the measurements for these parameters can be arranged into a rectangular matrix:


Here M is the number of experiments. Each experiment corresponds to the measurement of a system , sample, individual, etc. All such terms are used interchangeably.  


 Systems  Parameters
 Human individuals   Age, sex, education, income, weight, height, etc. 
 Chemical solutions   Spectral intensities at selected wavelength 
 Microchips in a control sample   Voltage and current at certain pins
 Clinical test participants   Lab test results


Mean values

The sample (population) mean vector of parameters is defined as:


For each measurement the vector of deviations can be defined as:


In the case of clinical research, one of the components of μ is an average patient temperature in a hospital. Obviously, more interesting is a deviation from this average.

The vectors of deviations form the matrix D similar to the initial matrix X:


Covariance matrix

The sample covariance matrix is defined as averaged products of the deviation vector components:


Here di,m is the i-th parameter of the m-th system.

Eq (5) can be rewritten in the following matrix form:


The superscript "T" denotes the matrix transposition.


The maximum likelihood covariance matrix CML differs by the factor M /(M-1) from the above definition:

.      (7)

The advantage of this definition is that the i-th diagonal element is the estimation for the variances of the  i-th parameter:


Correlation coefficient

Regardless of the covariance definition, the correlation coefficients are:


or in a matrix form:


Here is a diagonal matrix with the following matrix elements:


The correlation coefficient is a measure of the quality of a linear least squares fit for the original data. A higher σ value means a better linear fit.


This approach is implemented in a program called "Correlations". This program is available in the Download section below. You can also use more general "Stat Analysis" program.

Remark: In "Correlations" the meaning of the columns and rows is opposite to that of the tutorial.


  1. Program "Correlations".
  2. Program "Stat Analysis".


  1. Correlation and dependence.


- Definitions
- Distributions
- Multivar correlations
- Principal component analysis


© Nikolai Shokhirev, 2001 - 2017