Fat Tails in Finance

Fat tails refer to the excessive probability of “extreme” observations in a distribution. Natural disasters are a fact of life. They tend to have disastrous consequences, but fortunately, occur very rarely. Except for the occasional last-minute warning, they also have the nasty habit of being unpredictable. However, that does not imply that their probability is zero. Financial disasters are rather similar. Stock market crashes, oil crises, and exchange rate collapses occasionally remind us that there is a very relevant probability of observing extreme values. The magnitude of the fall out from such disasters explains the attention they attract, which is disproportionate to their supposedly minute probabilities of occurrence. Actuarial studies have acknowledged this fact for a long time. Estimating the probability of ruin is one of the major tasks new actuaries have to learn.

Attracted by expected payoffs, investors in financial markets are often lured into investing in high-risk assets. The downside risk is then managed or even fully covered by installing safeguards, such as stop loss or limit orders. As long as the market moves smoothly this does indeed guarantee a well-timed exit from an adverse market. However, it is well known that markets occasionally jump, sometimes excessively. In such situations, a market could suffocate from the accumulation of exit orders. While recently introduced circuit breakers provide market makers and brokers with valuable time to realign their positions, they do not offer similar protection for small investors. For them it is crucial to have at least some probabilistic idea of these “catastrophes.” Estimating these probabilities based on the past worst experience is not a good idea. The fact that probabilistically some observations occur on average only once every decade, or once every hundred years or more, indicates that so far we might have been just lucky in not observing a worse crash.


To specify these extreme probabilities, the finance literature usually prefers a “normal” stochastic process, like a Brownian motion, or a lognormal distribution with autoregressive conditional heteroskedasticity (ARCH) errors. That may be valid in a risk-neutral environment, but for the risk-averse investor an implied disaster probability of virtually zero might be fatal. Empirically, we know that financial prices do not at all behave like they are normally distributed. The pioneering studies by Mandelbrot (1963) and Fama (1963) acknowledge the fact that the observed fat tails are not well captured by normal distributions. Therefore, the Pareto or sum stable distribution s (including the normal as a special case) have been suggested as a likely alternative. Cornew et al. (1984) give empirical applications for different distributions nested within this clas s. Praetz (1972) and Blattberg and Gonedes (1974) proposed yet another class of distributions that had one major advantage over the Paretian class. The Student-t, while still being fat-tailed, has a finite variance unlike the Paretian. This fits better with the assumptions underlying asset pricing models.

An alternative model that also retains the finite variance property is given by Engle’s (1982) ARCH process for normally distributed innovations in asset prices. Instead of focusing on the unconditional distribution, ARCH specifies a conditional distribution for the variance. However, the apparent popularity of these models for describing the clusters in volatility falls short when evaluating their excess kurtosis capacity. A second normality-preserving approach is given by the mixtures-of-distributions hypothesis (see Tauchen and Pitts, 1983). But due to the necessary specification of a mixing process, or variable, this approach tends to be difficult to implement.

Estimation Procedures

The first step one should consider before engaging in any formal estimation is a simple plot of the empirical cumulative distribution function of variable X, versus a comparable (standardized by mean and variance) normal cumulative distribution. One can immediately observe the amount of excessive empirical ”frequency” at the lower or upper tail. Combined with more than normal probability in the center of the distribution, this phenomenon is known as leptokurtosis. Kurtosis (K) values exceeding 3, which is the normal value, point toward fat-tailed distributions. Unfortunately, a single extreme value may dramatically inflate the value of K. Formal testing of normality is based on two common tests. The Jarque-Bera (JB) test for normality uses both K and the measure for skewness (the normal being a symmetric non-skewed distribution): high valu es of JB point towards rejection of normality. Unfortunately, this test does not help us any further to indicate what an appropriate distribution would be.

A more enlightening insight may be obtained by focusing exclusively on the tails and plotting the “extreme” empirical quantiles versus different theoretical quantiles. For these different theoretical distributions, we can then apply a goodness-of-fit test:

tmp6A67_thumb

where we split the empirical frequency distribution into k quantiles and compare the observed frequency per quantile (O) with the theoretic ally expected frequency for that quantile(Ei). This KS-test is chi-squared distributed with k – 1 degrees of freedom.

Residual Life and Duration Models

The first formal model we discuss is usually en countered in the actuarial literature, where the concept of “mean residual life” e(.), is specified as follows:

tmp6A68_thumb

This e(x) is the complete expectation of “life.” Obviously, life can be interpreted as the remaining tail size of X given that X is larger than some prespecified level x. This exceedance function can then take several shapes depending on different underlying distributions F(x), the probability density function of X. The empirical e*(x) is a simple averaging process:

tmp6A69_thumb

where the subscript (i) relates to the ordered observations Xi, in descending order. The next step then consists of fitting theoretical e(.) to the e*(m)(.). Two techniques are typically used: either maximum likelihood estimation, or a minimizing distance measure (minimizing the distance between the empirical F*(x) and the theoretical F(x), as in the GF-test). Failure time or duration models are very similar to these residual life models. They also condition on a prespecified high level x, and then fit different distributions to the remaining tail. For that purpose, a derived probability function, the socalled survival function S(x) = 1 – F(x), is used. After fitting S(x), we can specify an inverse survival function which generates quantiles Z(a) that are exceeded by X with some prespecified probability a.

These fitting techniques have a drawback: if the distributions are not nested, or they do not have a finite variance, they are no longer valid.The next tool avoids that problem.

Extreme Value Models

A stationary time series X1, X2, . . . ., X„ of independent and identically distributed random variables has some unknown distribution function F(x). The probability that the maximum M„ of the first n random variables is below some prespecified level is given as:

tmp6A70_thumb

Even though we do not know which F applies, extreme value theory shows that after suitable normalization, this maxima distribution converges asymptotically to a limit-law extreme value distribution G(x). G(x) can be of three types where the main feature of distinction is the speed by which its tails decline. If they decline exponentially, the domain of attraction is given by a Gumbel distribution (encompassing the exponential and normal distributions). If, on the other hand they decline by a power (hence much slower), the domain of attraction is given by Frechet distributions (encompassing fat-tailed Paretian and Student-^ distributions). In the likely latter case, we can estimate the tail shape parameter abased on a sequence of the largest order statistics X(i) – the ordered empirical maxima. The Hill (1975) uses m as the number of order statistics considered to be the tail in the sample. The choice of m is the controversial part of this procedure. Including too many observations from the center of the distribution will lead to an increase in bias, while restricting too much will lead to an undesirable efficiency loss. So far, no undisputed procedure has been developed to estimate m. For a number of m-estimators, see Kalb et al. (1996). Fortunately, for financial applications choosing m is less relevant, given the very long time series. The availability of transaction prices has even further alleviated this problem.  estimator:

tmp6A71_thumb

With the chosen m, we can proceed with equation (5) and, based on the a-estimate, discriminate among a wide class of (not necessarily nested) distributions. For the Paretian distributions to be acceptable, a (their characteristic exponent) should be less than two. For values of a exceeding two, the Student- distributions are more likely (where a equals the degrees of freedom). For a approaching infinity, this implies a normal distribution. Having an estimate for this tail parameter, we can also calculate exceedance quantiles:

tmp6A72_thumb

where tail parameter a has been obtained above, and p is some chosen probability. Since we are interested in threshold levels Xp for which 1 – F(xp) = p, being extremely small (in factp < 1/n), the empirical distribution function is no longer of use. Extreme value theory, however, does allow probability statements beyond this p-limit, by extrapolating the empirical distribution function based on the estimated tail shape. Hence, we can even make “precise” statements on the probability of so far not observed catastrophic observations.

The major advantage of extreme value distributions is that they are limit distributions. This implies that regardless of the data generating F(x), for large values of m, they become good approximations. If F(x) were known, of course, the use of limiting distributions should be avoided. In that case the true distribution of extremes is known as well.

In conducting tail analysis, we often assume that the extreme observations are independently and identically distributed. However, it is well known that clusters occur in both small values and large values. The ARCH models are especially designed to capture this phenomenon. In principle, however, each of the above mentioned tail approaches can be adapted to cover clustering effects. One attractive “mixing” approach is given by the EM models.

Mixtures Modelling by Expectation Maximization (EM) Analysis

Kalb et al. (1996) develop a novel approach where the extreme value method is combined with an EM algorithm to capture potential mixtures of distributions. Since maximum likelihood is generally preferred as the statistic ally most efficient approach, we would like to incorporate its application while acknowledging the non-nesting problems. A two-stage procedure is proposed where the discrimination among classes of distributions is conducted by extreme value estimation. This results in a tail parameter which indicates the appropriate class. The second stage then exploits this information by further refining the parameter estimates (a plus additional characterizing parameters) by maximum likelihood estimation. We use the extreme value distribution class as input in the likelihood function, and reestimate its parameters. This will also provide a check on the appropriateness of the chosen m. If the updated tail parameter differs too much from the extreme value stage, one has to rethink the choice of m, and repeat the first stage. The extended parameter set in the second stage allows for incorporating particular anomalies in the tail observations like size clustering. The actuarial application given in Kalb et al. (1996) can easily be adjusted to also allow for temporal clustering effects that are typical for financial time series.

Some Empirical Findings

The empirical evidence based on residual life models seems restricted to actuarial applications giving mixed results (see Hogg an d Klugman, 1983). Increasing “residual lives” either indicate a Paretian distribution, or perhaps a Weibull or lognormal distribution. Fitting is performed by maximum likelihood after which a likelihood ratio test can be used to discriminate among these distributions.

Extreme Value Estimates

Since the residual life plots are rather restricted, we may resort to more powerful discerning techniques. Extreme value theory has by now been applied to many financial time series. Examples are Jansen and de Vries (1991) for stock returns, and Koedijk et al. (1990) for exchange rates. Empirical evidence for exchange rates points towards extremely fat-tailed Paretian distributions. This is even true for fixed exchange rates, perhaps due to the inevitable occasional devaluations.

Consequences of Fat-Tailedness

When we know (or have an estimate for) the tail probabilities, how can we usefully apply them? First of all, we have to be sure that our probability estimates have a low standard error. Both exaggerating and underestimating extreme risk could be very costly. The following applications will briefly indicate why it is important to optimally determine the tails. Obviously, probability statements do not help us in forecasting asset prices (except perhaps in option pricing), nor do they indicate when a particularly large observation is going to occur. It does help, however, to attach appropriate probabilities of occurrence to these observations, and act accordingly in trading off expected returns against risk.

Risk Specification

Asset pricing, be it in a CAPM, APT, or eve n option-pricing setting, is usually performed with a mean-variance framework in mind. This utility-maximizing approximation does not leave room for higher order moments, i.e. it ignores the risk captured by the tails of the distribution. Safety-first models, as proposed i n the 1950s by Roy (1952), do allow this risk to enter the portfolio decision process. In de Haan et al. (1994) it is shown how extreme value theory can be used to assist in comparing portfolio classes based on prespecified risk probabilities (p) and accompanying exceedance quantile as derived from equation (6) above.

Limits and Other (Temporary) Trading Suspensions

The stock market crashes in the late 1980s le d to a demand for smoother news absorption mechanisms in times of extreme price changes. Circuit breakers have become the latest regulatory fad, but these were already preceded on most commodity exchanges by price limits. Whereas price limits are more rigid, and therefore potentially more distorting, they clearly outperform circuit breakers as far as sm all investors are concerned. Since an exchange would like to distort the trading process as little as possible, how should it set a price limit or a circuit breaker invoking price change? Obviously, this problem translates into specifying an appropriate p in (6) and then calculating the accompanying quantile. If we look at the exchanges where price limits have been operational, it is obvious that a very small p has supposedly been selected.

Margins

In Kofman (1993), the extreme value theory has been applied to a futures margins setting. To protect the integrity of the exchange, clearing houses usually require a specified percentage level of the contract value to be maintained in a margin deposit by traders on the exchange. These margins are then passed on (marked up) to final customers. Since margins have to be maintained daily (and sometimes even more frequently), the optimal margin level should be sufficient to cover a prespecified maximum ( extreme) price change, a level which is only exceeded with for example 0.01 percent probability. Obviously, both the clearing house and the traders want to keep margins as low as possible to attract a maximum order flow, while securing the exchange’s financial viability.

Bid-Ask Spreads

Market makers quote bid-ask spreads to get compensation for the cost of generating market liquidity. This cost component can be split into three parts: a risk of holding temporary open positions, a risk of asymmetric information, and normal order-processing costs. In highly competitive markets, the latter component will typically dominate the size of the spread. However, in small illiquid markets the other two components become dominant. Both of these are directly related to the risk of sudden large price changes. A rational market maker should then incorporate these extremal prob abilities in optimally setting the spread. In Kofman and Vorst (1994), these tail probabilities are estimated from Bund futures transaction returns. It seems that market makers are well compensated for the actual risk they incur in holding open positions. This may, however, be a characteristic of a highly liquid market (in this case LIFFE, in London) and for small, illi quid exchanges these risks can be expected to be much higher.

Thresholds

Before the demise of the European Monetary System in 1992, currency speculators could take almost riskless futures and/or forward positions in EMS currencies if interest rates were out of line with covered interest parity. Compulsory monetary interventions guaranteed effective price limits. The newly proposed target zones, which allow occasional exceedances, may change all that. The occasional exceedances will induce excessive fattailedness in the exchange rate returns distribution, as was observed in Koedijk et al. (1990). These sudden jump probabilities can then no longer be neglected.

This short list of applications in finance illustrates the importance of appropriate inference on the shape of the tail of asset price (or its return) distributions. The tools introduced above are (relatively) easy to apply and should be considered before deciding to enter promising high yield markets. Arbitrage tells us that every excess return has its price in risk; maybe these excesses do not always compensate for the ultimate, extreme, risk.

Next post:

Previous post: