PRINCIPLE COMPONENT ANALYSIS

January 17, 2000 12:00 am

Trading books as well as debt portfolios of banks are exposed to a huge number of factors driving their mark-to-market value.

INTRODUCTION

Trading books as well as debt portfolios of banks are exposed to a huge number of factors driving their mark-to-market value. Consider, for instance, the fixed income portfolio of a bank, which is primarily engaged in Europe. The P&L of such a portfolio depends on a considerable number of yield curves. In the government bond sector alone there are 10 yield curves making up Euroland. In addition, the yield curves of the U.K., Norway, Sweden, Denmark, Switzerland and the Eastern European economies must also be included. If we now think of each yield curve as being discretized in terms of 15 grid-points, i.e. we regard it as a 15-dimensional vector of positive real numbers, we have already a total of more than 250 risk drivers. On the other hand, it is clear that at least adjacent grid-points along a yield curve are highly correlated with each other. Therefore, one strategy to reduce the dimensionality of the original problem is to exploit this inherent linear dependency.

Another important area where one tries to keep the number of the driving factors small is in the pricing of interest-rate options by the use of multi-factor models. Particularly, multi-dimensional American- or Bermudan-style interest-rate options, which constitute so-called stochastic optimal stopping time problems, give rise to a formidable computational complexity in terms of the running time of the algorithms, which are typically based on Monte-Carlo simulations.

One way to reduce the number of factors, while keeping almost as much information as there is in the original 15 rates, as well as to gain insight into the term structure of a yield curve, is to identify those common factors, which drive the yield curve. This leads to what is known as Principal Component Analysis (PCA) in the statistical literature. Long before the PCA made its way into finance, it had been and is still successfully applied in the social sciences and in bio-statistics.

WHAT IS PCA AND HOW DOES IT WORK?

To start with, consider an individual yield curve and let (sum)^ be the estimated covariance matrix of the corresponding 15 rates. The PCA is concerned with explaining the variance-covariance structure through a few linear combinations of the original variables. On the basis of the estimated covariance matrix it can be understood as a fit to the historical data. Performing a PCA means, in essence, to successively identify those normalized linear combinations of the 15 interest rates (or, more precisely, of their log-returns) that carry the highest variances. Algebraically, these linear combinations, the so-called Principal Components (PCs), are simply the eigenvectors to the largest eigenvalues of (sum)^. Note that (sum)^ ought to be positive semi-definite, i.e. it admits only non-negative eigenvalues. In addition, the PCs are not only linearly independent, but even pairwise orthogonal to each other. This orthogonality translates into the language of statistics to the uncorrelatedness and stochastic independence of the PCs, which fall together for the multivariate normal distribution. Thus, the resulting covariance matrix of the 15 PCs is going to be a diagonal matrix. The diagonal elements are the eigenvalues to the corresponding eigenvectors, which can be reinterpreted as the variances of the PCs. Now, arrange the 15 PCs according to their variances, in descending order, so that the first PC carries the highest variance, and so forth. Empirically, it is known that the leading three PCs already explain more than 90% of the total variance for most yield curves with the total variance just being the sum of all variances, i.e. of all eigenvalues, of the PCs. Finally, the leading three PCs of a yield curve admit a nice interpretation, namely as parallel shift, twist and butterfly (see figure 1).

Sometimes it is necessary to attain a certain pre-assigned proportion, say, 95%, of the total variance. This can be easily achieved by summing up the variances of the leading PCs until the threshold value is exceeded.

Restricting to the leading PCs obviously reduces the dimensionality of the original problem considerably. Instead of 15 factors for a country-specific curve it is generally sufficient, at least for the purpose of risk quantification, to consider only the leading three PCs.

Having discussed the application of the PCA to a single yield curve, we proceed with its application to portfolios depending on a multitude of yield curves. Here one would structure the overall covariance matrix in such a way that the country-specific submatrices are ordered according to their relevance for the considered portfolio. More precisely, one groups the (single) covariance submatrices of each yield curve along the diagonal of the overall covariance matrix. Now one can apply the PCA to each submatrix along the diagonal of the grand matrix and restrict the obtained matrix to the first three PCs of every sub-diagonal block. Then the submatrices along the diagonal of the resulting covariance matrix are diagonal matrices, while the off-diagonal blocks are in general fully populated. The entries of the off-diagonal blocks represent the correlations between the PCs of different markets. For practical purposes it is sufficient to consider only primary correlations in the off-diagonal blocks. For instance, one would take into account the correlation between the parallel shifts of different curves, while, for example, the correlation between a parallel shift and a twist would be set to zero. This intuitively makes sense, as even the parallel shift and the twist of one and the same yield curve are uncorrelated.

APPLYING PCA TO THE COVARIANCE OR
TO THE CORRELATION MATRIX?

Depending on the context, it might also make sense to apply the PCA to the correlation matrix instead of the covariance matrix. From a numerical point of view this is even favorable, as the eigenvalues will then be confined to a smaller range. It is important to note that there is no simple relationship between the PCs of the two matrices, since the PCs are not scale invariant: mathematically speaking, if S = diag(*₁^-1,...,*_n^-1), then we obtain the correlation matrix C from the covariance matrix (sum) by a simple transformation, namely C = S (sum) S. However, if * is an eigenvector (PC) of (sum), then the transformed vector S^-1 * does not need to be an eigenvector (PC) of C. Consequently, the PCs may differ significantly if they are obtained from the correlation matrix rather than the covariance matrix.

PCA FOR VALUE-AT-RISK (VAR) PURPOSES

Consider a portfolio consisting of instruments whose pricing functions are more or less linearly governed by the market rates, say, a diversified bond portfolio.

The variation in price of such a portfolio with respect to the PCs can be understood as the systematic risk to which the whole market is exposed, while the idiosyncratic risk, due to the price variation of a single security, is obviously not covered.

The simplest way to determine the systematic risk, in terms of the Value-at-Risk (VaR) measure, is by the delta-normal method. The latter is given by the respective quantile z_1-_* of the standard normal distribution to the chosen confidence level 1 - * scaled up by the square root of the portfolio variance. The latter can easily be computed from the estimated covariance matrix (sum)^ of the log-returns of the underlying market factors and from the delta vector * = (*₁,...,*_n), representing the portfolio sensitivities with respect to the market rates:

As mentioned previously, the estimated covariance matrix (sum)^ is always positive semi-definite. Whenever the number of the market risk factors exceeds the length of the historical observation period, (sum)^ is a singular matrix. This implies that there exist sensitivity vectors, such that the portfolio variance * (sum)^ *^T and hence the VaR vanish. Economically, this can be rephrased as the existence of a riskless portfolio. Obviously, this does merely reflect an artifact of the estimation procedure, namely of the insufficient length of the historical time series.

If we apply the PCA in such a situation, one has a good chance that this deficiency can be resolved, and that one obtains a strictly positive definite reduced covariance matrix.

An often stated critique why an application of the PCA to the covariance matrix can lead to an underestimation of the VaR is that a trader could put the weight of the portfolio sensitivities on the higher order PCs, the ones that are neglected. Consequently, a straightforward application of the PCA to (sum)^ could possibly result in a significant underestimation of the true VaR. This underestimation could be remedied by an application of the PCA to the delta-weighted covariance matrix(sum)^^T,with=diag(*₁,...,*_n). However, this modification is at the expense of a loss in transparency, as the resulting PCs do not anymore admit the nice interpretations mentioned above.

It is our opinion that it is generally sufficient to apply the PCA to the covariance matrix, as the danger of a gross underestimation does in practice rarely arise. Also, the argument that a trader could intentionally structure his portfolio in such a way that its sensitivities are on the higher order PCs is hardly applicable. Such a strategy would force him to restructure his portfolio on a daily basis, as the higher order PCs are unstable and can considerably change from day to day. Moreover, the prime objective of a trader is to generate profit rather than to aim for zero risk. Since the work of Markowitz on optimal portfolio selection, it is known that a higher profit can only be gained by taking higher risk. In this light we view this critique against the conventional PCA (the one applied to the covariance matrix) of more theoretical than practical relevance.

This week's Learning Curve was written by Hermann Haaf andHannes Wilhelm, in the risk control group at Dresdner Bank in Frankfurt.

Acknowledgement: We thank our colleague Hans-Juergen Brasch for drawing our attention to the limitations of the PCA for risk measurement and a possible solution.