File Name: Analytics – Statistical Analysis
Location: Modeling Toolkit | Analytics | Statistical Analysis
Brief Description: Applies the Statistical Analysis tool to determine the key statistical characteristics of your dataset, including linearity, nonlinearity, normality, distributional fit, distributional moments, forecastability, trends, and the stochastic nature of the data
Requirements: Modeling Toolkit, Risk Simulator
This model provides a sample dataset on which to run the Statistical Analysis tool in order to determine the statistical properties of the data. The diagnostics run includes checking the data for various statistical properties.
To run the analysis, follow the instructions below:
Spend some time going through the reports generated to get a better understanding of the statistical tests performed.
Figure 8.1: Running the Statistical Analysis tool
Figure 8.2: Statistical tests
The analysis reports include the following statistical results:
Figure 8.3 shows a sample report generated by Risk Simulator that analyzes the statistical characteristics of your dataset, providing all the requisite distributional moments and statistics to help you determine the specifics of your data, including the skewness and extreme events (kurtosis and outliers). The descriptions of these statistics are listed in the report for your review. Each variable will have its own set of reports.
Figure 8.3: Sample report on descriptive statistics
Figure 8.4 shows the results of taking your existing dataset and creating a distributional fit on 24 distributions. The best-fitting distribution (after Risk Simulator goes through multiple iterations of internal optimization routines and statistical analyses) is shown in the report, including the test statistics and requisite p-values indicating the level of fit. For instance, Figure 8.4’s example dataset shows a 99.54% fit to a normal distribution with a mean of 319.58 and a standard deviation of 172.91. In addition, the actual statistics from your dataset are compared to the theoretical statistics of the fitted distribution, providing yet another layer of comparison. Using this methodology, you can take a large dataset and collapse it into a few simple distributional assumptions that can be simulated, thereby vastly reducing the complexity of your model or database while at the same time adding an added element of analytical prowess to your model by including risk analysis.
Figure 8.4: Sample report on distributional fitting
Sometimes, you might need to determine if the dataset’s statistics are significantly different than a specific value. For instance, if the mean of your dataset is 0.15, is this statistically significantly different than, say, zero? What about if the mean was 0.5 or 10.5? How far enough away does the mean have to be from this hypothesized population value to be deemed statistically significantly different? Figure 8.5 shows a sample report of such a hypothesis test.
Figure 8.5: Sample report on theoretical hypothesis tests
Figure 8.6 shows the test for normality. In certain financial and business statistics, there is a heavy dependence on normality (e.g., asset distributions of option pricing models, normality of errors in a regression analysis, hypothesis tests using t-tests, Z-tests, analysis of variance, and so forth). This theoretical test for normality is automatically computed as part of the Statistical Analysis tool.
Figure 8.6: Sample report on testing for normality
If your dataset is a time-series variable (i.e., data that has an element of time attached to them, such as interest rates, inflation rates, revenues, and so forth, that are time-dependent) then the Risk Simulator data analysis reports shown in Figures 8.7 to 8.11 will help in identifying the characteristics of this time-series behavior, including the identification of nonlinearity (Figure 8.7) versus linear trends (Figure 8.8), or a combination of both where there might be some trend and nonlinear seasonality effects (Figure 8.9). Sometimes, a time-series variable may exhibit a relationship to the past (autocorrelation). The report shown in Figure 8.10 analyzes if these autocorrelations are significant and useful in future forecasts, that is, to see if the past can truly predict the future. Finally, Figure 8.11 illustrates the report on nonstationarity to test if the variable can or cannot be readily forecasted with conventional means (e.g., stock prices, interest rates, foreign exchange rates are very difficult to forecast with conventional approaches and require stochastic process simulations) and identifies the best-fitting stochastic models such as a Brownian motion random walk, mean-reversion and jump-diffusion processes, and provides the estimated input parameters for these forecast processes.
Figure 8.7: Sample report on nonlinear extrapolation forecast (nonlinear trend detection)
Figure 8.8: Sample report on trend-line forecasts (linear trend detection)
Figure 8.9: Sample report on time-series forecasting (seasonality and trend detection)
Figure 8.10: Sample report on autocorrelation (past relationship and correlation detection)
Figure 8.11: Sample report on stochastic parameter calibration (nonstationarity detection)