CONVOLUTION AND COPULAS: THEORY AND PRACTICE

  • By Admin
  • April 15, 2015
  • Comments Off on CONVOLUTION AND COPULAS: THEORY AND PRACTICE

This technical white paper explains the basics of Convolution Theory and Copula Theory as they apply to probability distributions and stochastic modeling, both in theory and practice. It attempts to show that, in theory,convolution and copulas are elegant and critical in solving basic distributional moments but when it comes to practical applications, these theories are impractical and mathematically intractable, resulting in the need for running empirical Monte Carlo simulations, where the results of said empirical simulations approach the theoretically predicted results at the limit, allowing practitioners a powerful practical toolkit for modeling.

Many probability distributions are both flexible and interchangeable. For example:

  • Arcsine and Parabolic distributions are special cases of the Beta distribution.
  • Binomial and Poisson distributions approach the Normal distribution at the limit.
  • Binomial distribution is a Bernoulli distribution with multiple trials.
  • Chi‐Square distribution is the squared sum of multiple Normal distributions.
  • Discrete Uniform distributions’ sum (12 or more) approaches the Normal distribution.
  • Erlang distribution is a special case of the Gamma distribution.
  • Exponential distribution is the inverse of the Poisson distribution on a continuous basis.
  • F distribution is the ratio of two Chi‐Square distributions.
  • Gamma distribution is related to the Lognormal, Exponential, Pascal, Erlang, Poisson, and Chi‐Square distributions.
  • Laplace distribution comprises two Exponential distributions in one.
  • Lognormal distribution’s logarithmic values approach the Normal distribution.
  • Pascal distribution is a shifted Negative Binomial distribution.
  • Pearson V distribution is the inverse of the Gamma distribution.
  • Pearson VI distribution is the ratio of two Gamma distribution.
  • PERT distribution is a modified Beta distribution.
  • Rayleigh distribution is a modified Weibull distribution.
  • T distribution with high degrees of freedom (> 30) approaches the Normal distribution.

Mathematicians came up with these distributions through the use of convolution. As a quick introduction, if there are two independent and identically distributed (i.i.d.) random variables, X and Y, and where their respectively known probability density functions (pdf) are fx(x) and fy(y), we can then generate a new probability distribution by combining X and Y using basic summation,multiplication, and division. Some examples are listed above, e.g., the F distribution is a division of two Chi‐Square distributions, the normal distribution is a sum of multiple uniform distributions, etc.To illustrate how this works, consider the cumulative distribution function (cdf) of a joint probability distribution between the two random variables X and Y:

CONVOLUTION AND COPULAS

Differentiating the cdf equation above yields the pdf:

CONVOLUTION AND COPULAS_1

Example 1: The convolution of the simple sum of two identical and independent uniform distributions approaches the triangular distribution.

As a simple example, if we take the sum of two i.i.d. uniform distributions with a minimum of 0 and maximum of 1, we have:

CONVOLUTION AND COPULAS_2

Where for a Uniform [0, 1] distribution, f (x) =1 when 0 <= x <=1, we have:

CONVOLUTION AND COPULAS_3

Which approaches a simple triangular distribution.

The figure below shows an empirical approach where two Uniform [0, 1] distributions are simulated for 20,000 trials and their sums added. The computed empirical sums are then extracted and the raw data fitted using the Kolmogorov‐Smirnov fitting algorithm in Risk Simulator. The triangular distribution appears as the best‐fitting distribution with a 74% goodness of fit. As seen in the convolution of only two uniform distributions, the result is a simple triangular distribution in Risk Simulator. The triangular distribution appears as the best‐fitting distribution with a 74% goodness of fit. As seen in the convolution of only two uniform distributions, the result is a simple triangular distribution.

CONVOLUTION AND COPULAS 1

Example 2: The convolution simple sum of twelve identical and independent uniform distributions approaches the normal distribution.

If we take the same approach and simulate 12 i.i.d. Uniform [0, 1] distributions and summed them, we would obtain a very close to perfect Normal distribution as shown below, with a goodness of fit at 99.3% after running 20,000 simulation trials.

CONVOLUTION AND COPULAS 2

Example 3: The convolution simple sum of multiple identical and independent exponential distributions approaches the gamma (Erlang) distribution.

In this example, we sum two i.i.d. exponential distributions and generalize it to multiple distributions.To get started, we use two identical Exponential [λ = 2] distributions:

CONVOLUTION AND COPULAS 3

where f (x)= λe-λX is the pdf for the exponential distribution for all x>=0;λ>=0 and the distribution’s mean is Β=1/λ.

If we generalize to n random i.i.d. exponential distributions and apply mathematical induction:

CONVOLUTION AND COPULAS 4

This is, of course, the generalized gamma distribution with α and β for the shape and scale parameters:

CONVOLUTION AND COPULAS 5

When the β parameter is a positive integer, the gamma distribution is called the Erlang distribution, used to predict waiting times in queuing systems, where the Erlang distribution is the sum of independent and identically distributed random variables each having a memoryless exponential distribution. Setting n as the number of these random variables, the mathematical construct of the Erlang distribution is:

CONVOLUTION AND COPULAS 6

The empirical approach is shown below where we have two exponential distributions with λ = 2 (this means that the mean β = 1/λ = 0.5). The sum of these two distributions, after running 20,000 Monte Carlo simulation trials and extracting and fitting the raw simulated sum data, shows a 99.4% goodness of fit when fitted to the gamma distribution where the α = 2 and β = 0.5 (rounded), corresponding to   n = 2 and λ = 2.

CONVOLUTION AND COPULAS 7

COPULAS
A copula is a multivariate probability distribution for which the marginal probability distribution of each variable is uniform. Copulas are used to describe the dependence between random variables and are typically used to model distributions that are correlated with one another.

The standard definition of copulas is based on Sklar’s Theorem, which states that an m‐dimensional copula (or m‐copula) is a function C from the unit m‐cube [0, 1]m to the unit interval [0, 1] that satisfies the following conditions:

CONVOLUTION AND COPULAS 8

Consider a continuous m‐variate distribution function F ( y1….ym ) with univariate marginal Distributions F1(y1)… Fm(ym) and inverse quantile functions F1 -1…. Fm -1. Then we have y1=F1-1(u1)-F1,,,,,,,ym=Fm-1(um)–Fm where u1…..um are uniformly distributed variates. Therefore, the transforms of uniform variates are distributed as Fi(i=1,,,,m). This means we have:

CONVOLUTION AND COPULAS 9

where C is the unique copula associated with the distribution function. That is, y ~ F , and F is continuous, then F1(y1),,,,,Fm(ym)~C, and if U ~ C , then we have F1-1(u1),,,,,Fm-1(um) ~ F.  Mathematical algorithms using Iman‐Conover and Cholesky decomposition matrices are used to compute the joint marginal distributions. Copulas are parametrically specified joint distributions generated from given marginals. Therefore, properties of copulas are analogous to properties of joint distributions.

Pros and Cons of Convolution and Copulas

Convolution theory is applicable and elegant for theoretical constructs of probability distributions. With basic addition, multiplication, and division of known i.i.d. distributions, we can determine its theoretical outputs. The issue with convolution theory is that there are no correlations (independently distributed) between the random variables and their distributions, and the individual distributions have to be exactly the same (identically distributed) and commonly known.

Therefore, if one modifies the distributions, uses exotic distributions, mixes and matches different non–i.i.d. distributions, adds correlations, and creates large Excel models (beyond the simple addition, multiplication, or division as shown above, such as when there are exotic financial models and computations), truncation, empirical nonparametric distributions, historical simulation, and other combinations of such issues, convolution will not work and cannot predict the outcomes. In addition, both convolution and copula theorems can only be used to compute correlations of joint distributions but would be limited to only a few distributions before the mathematics become intractable due to the large matrix inversions, multiple integrals and differential equations that need to be solved. Therefore, users are restricted to using Monte Carlo risk simulations.

Share Button

Comments are closed.