BACKTESTING REVISITED

June 19, 2000 12:00 am

If value at risk forecasts based on an internal market risk model, such as M, are used in practice, it is of crucial importance to know whether M's forecast quality is sufficiently high to be relied on.

If value at risk forecasts based on an internal market risk model, such as M, are used in practice, it is of crucial importance to know whether M's forecast quality is sufficiently high to be relied on. A graphic display of the time series of forecasts, and observed changes l_t , of the P&L, a plot often included in risk reports for senior management, gives important information, but a quantitative judgement about the forecast quality--beyond "the model is conservative"--is not yet possible.

In order to render precisely the idea of forecast quality, assume for a moment that model M works perfectly. To be specific--but still general enough--it is assumed that M is a Delta Normal model, as outlined in RiskMetrics¹. In that framework P_t , for the forecast distribution, is characterized by a normal distribution, say N(0,* _t²). For a perfect M the transformed observations:

could not be distinguished from independent draws from a N(0,2.33^-²) distribution. Now any deviation from these properties, for example, deviations with respect to the distribution, the homogeneity (identical distributed draws) and the independence of that process, highlights deficiencies of M's forecast quality. In order to detect such deviations, well-known exploratory techniques (time plots; quantile-quantile plots, or Q-Q plots for short; and so-called probability plots, or P-P plots) are applied to s_t. These graphical tools--outlined in a moment--were successfully applied in practice for very different hierarchy levels of the portfolio tree ranging from the whole trading book to trading portfolios of specific desks. Compared with inferential tests, for example, a Box-Ljung test, the simplicity of these methods simplifies the communication of the results, of course.

One last word regarding the distinct features of Q-Q plots and P-P plots. Both plots are important graphical techniques for comparing shapes of distributions. Whereas Q-Q plots plot the empirical quantiles F_n^-¹(p) against the quantiles G^-¹(p) of a known distribution, the P-P plots plot empirical percentages F_n(x) and G(x) for various x. If F_n and G(x) coincide, both graphs show a straight line. Hence, deviations from the straight line signal different distributions. Unlike the Q-Q plot, however, the P-P plot does not yield a straight line when F_n and G(x) have the same shape but are not identical. This property makes the P-P plot less useful for studying shape in terms of location-scale families like the family of normal distributions. Furthermore the P-P plot gives more emphasis to how the data or the distribution compare in the center--much as the Q-Q plot gives more space to the tails.

The day-by-day output of M is a time series of prediction-realization pairs:

where P_t expresses the uncertainty about changes of the P&L over a period of time through a whole forecast distribution. Information of particular interest is extracted by a parameter, T(P_t), for example, the VaR.

Now, what can be inferred about the forecast quality of M, from given the time series (2)? As practice has shown, VaR is often calculated conservatively, at least at the trading book level. Hence, the number of how often the VaR forecasts were exceeded by their associated changes of the P&L does not contain enough information about M's forecast quality. The best way to tackle the outlined problem is to combine l_t with the cumulative distribution function, F_t of P_t.

EMPIRICAL ANALYSIS

Every analysis should start with a time plot of the data at hand, see Figure 1. This particular example shows the time series of s_t at trading book level. All 180 observations are in the interval [-0.75, 0.75], hence no VaR exceedance was observed. The narrow range of the data indicates a rather conservative estimation of risk. Now, the S-shaped Q-Q plot given in Figure 2 shows distinctive deviations of the data from normality, especially in the tails of the distribution, the part that is most interesting. A similar plot, a P-P plot--also advocated by Crnkovic and Drachman² from J.P. Morgan--based on the fact that in an ideal model, the transformations F_t(l_t) could not be distinguished from independent draws from a uniform distribution, U[0,1]--also shows a clear deviation from an ideal fit, in Figure 3. Both plots indicate a certain heterogeneity, because the S-shape is typical for mixture distributions, that is to say, different regimes. Furthermore, the data are not realizations from independent draws, because the exceedances are over a specific threshold cluster, see Figure 4. In order to quantify the amount of conservatism, the standard deviation (std) of s is calculated, yielding std=0.19. In the ideal model a standard deviation of 0.43 is to be expected. Hence the VaR model at hand overestimates the risk about 100%.

SUMMARY

The forecast quality of the considered model is rather bad--it missed all desiderata of a good forecast quality. It is neither acceptably calibrated nor does it capture important features like dependence or different regimes. Its risk estimates are too imprecise. The proposed elementary methods extract and highlight important information. Hence, it seems good advice to include in risk reports Q-Q plots and a figure quantifying the conservatism of the model, indicating the need for a possible recalibration.

¹ RiskMetrics (1996) J. P. Morgan & Co. RiskMetricsTechnical Document. New York, December 1996

²Crnkovic, C. and J. Drachman (1996) Quality Control. Risk Magazine, No. 9, p. 138-143.

This week's Learning Curve was written by Gerhard Stahl, senior analyst, and Uwe Traber, head of department, at the Bundesaufsichtsamt für das Kreditwesen (Federal Banking Supervisory Office), Berlin.