Distributional Fitting

  • By Admin
  • October 28, 2014
  • Comments Off on Distributional Fitting

Theory
Another powerful simulation tool is distributional fitting; that is, which distribution does an analyst or engineer use for a particular input variable in a model? What are the relevant distributional parameters? If no historical data exist, then the analyst must make assumptions about the variables in question. One approach is to use the Delphi method, where a group of experts are tasked with estimating the behavior of each variable. For instance, a group of mechanical engineers can be tasked with evaluating the extreme possibilities of a spring coil’s diameter through rigorous experimentation or guesstimates. These values can be used as the variable’s input parameters (e.g., uniform distribution with extreme values between 0.5 and 1.2). When testing is not possible (e.g., market share and revenue growth rate), management can still make estimates of potential outcomes and provide the best-case, most-likely case, and worst-case scenarios, whereupon a triangular or custom distribution can be created.

However, if reliable historical data are available, distributional fitting can be accomplished. Assuming that historical patterns hold and that history tends to repeat itself, then historical data can be used to find the best-fitting distribution with their relevant parameters to better define the variables to be simulated. Figures 1, 2, and 3 illustrate a distributional-fitting example. The following illustration uses the Data Fitting file in the examples folder.

Procedure

    Use the following steps to perform a distributional fitting model:

  • Open a spreadsheet with existing data for fitting (e.g., use the Data Fitting example file).
  • Select the data you wish to fit, not including the variable name (data should be in a single column with multiple rows).
  • Select Risk Simulator | Tools | Distributional Fitting (Single-Variable).
  • Select the specific distributions you wish to fit to or keep the default where all distributions are selected and click OK (Figure 1).
  • Review the results of the fit, choose the relevant distribution you want, and click OK (Figure 2).

Results Interpretation
The null hypothesis (Ho) being tested is such that the fitted distribution is the same distribution as the population from which the sample data to be fitted come. Thus, if the computed p-value is lower than a critical alpha level (typically 0.10 or 0.05), then the distribution is the wrong distribution. Conversely, the higher the p-value, the better the distribution fits the data. Roughly, you can think of p-value as a percentage explained; that is, if the p-value is 0.9727 (Figure 2), then setting a normal distribution with a mean of 99.28 and a standard deviation of 10.17 explains about 97.27% of the variation in the data, indicating an especially good fit. The data was from a 1,000-trial simulation in Risk Simulator based on a normal distribution with a mean of 100 and a standard deviation of 10. Because only 1,000 trials were simulated, the resulting distribution is fairly close to the specified distributional parameters, and in this case, about a 97.27% precision.

Untitled-1

Untitled-2

Both the results (Figure 2) and the report (Figure 3) show the test statistic, p-value, theoretical statistics (based on the selected distribution), empirical statistics (based on the raw data), the original data (to maintain a record of the data used), and the assumption complete with the relevant distributional parameters (i.e., if you selected the option to automatically generate assumption and if a simulation profile already exists). The results also rank all the selected distributions and how well they fit the data.

Fitting Multiple Variables
For fitting multiple variables, the process is fairly similar to fitting individual variables. However, the data should be arranged in columns (i.e., each variable is arranged as a column) and all the variables are fitted. The same analysis is performed when fitting multiple variables as when single variables are fitted. The difference here is that only the final report will be generated and you do not get to review each variable’s distributional rankings. If the rankings are important, run the single-variable fitting procedure instead, on one variable at a time.

Procedure

  • Open a spreadsheet with existing data for fitting.
  • Select the data you wish to fit (data should be in multiple columns with multiple rows).
  • Select Risk Simulator | Tools | Distributional Fitting (Multi-Variable).
  • Review the data, choose the types of distributions you want to fit to, and click OK.

Notes
Notice that the statistical ranking methods used in the distributional fitting routines are the Chi-Square test and Kolmogorov-Smirnov test. The former is used to test discrete distributions and the latter, continuous distributions. Briefly, a hypothesis test coupled with the maximum likelihood procedure with an internal optimization routine is used to find the best-fitting parameters on each distribution tested, and the results are ranked from the best fit to the worst fit. There are other distributional fitting tests such as the Anderson-Darling, Shapiro-Wilks, etc. However, these tests are very sensitive parametric tests and are highly inappropriate in Monte Carlo simulation distribution-fitting routines when different distributions are being tested. Due to their parametric requirements, these tests are most suited for testing normal distributions and distributions with normal-like behaviors (e.g., binomial distribution with a high number of trials and symmetrical probabilities) and will provide less accurate results when performed on non-normal distributions. Take great care when using such parametric tests. The Kolmogorov-Smirnov and Chi-Square tests employed in Risk Simulator are nonparametric and semiparametric in nature and are better suited for fitting normal and non-normal distributions.

Share Button

Comments are closed.