LOSS FUNCTIONS (Social Science)

The loss function (or cost function) is a crucial ingredient in all optimizing problems, such as statistical decision theory, policymaking, estimation, forecasting, learning, classification, financial investment, and so on. The discussion here will be limited to the use of loss functions in econometrics, particularly in time series forecasting.

tmpA5-60_thumbtmpA5-61_thumb 

LOSS FUNCTIONS AND RISK

Granger (2002) notes that an expected loss (a risk measure) of financial return Y , that has a conditional pre-

tmpA5-62_thumbtmpA5-63_thumbtmpA5-64_thumb


Zhuanxin Ding, Clive Granger, and Robert Engle (1993) study the time series and distributional properties of these measures empirically and show that the absolute deviations are found to have some particular properties, such as the longest memory. Granger remarks that given that the financial returns are known to come from a long-tail distribution, 9 = 1 may be more preferable.

Another problem raised by Granger is how to choose

tmpA5-65_thumb

We consider some variant loss functions with 9 = 1, 2 below.

LOSS FUNCTIONS AND REGRESSION FUNCTIONS

Optimal forecasting of a time series model depends extensively on the specification of the loss function. Symmetric quadratic loss function is the most prevalent in applications due to its simplicity. The optimal forecast under quadratic loss is simply the conditional mean, but an asymmetric loss function implies a more complicated forecast that depends on the distribution of the forecast error as well as the loss function itself (Granger 1999), as the expected loss function is formulated with the expectation taken with respect to the conditional distribution. Specification of the loss function defines the model under consideration.

tmpA5-66_thumb[1]

When we interchange the operations of differentiation and integration,

tmpA5-67_thumbtmpA5-68_thumb

forms the condition of forecast optimality:

tmpA5-69_thumb

that is, a martingale difference (MD) property of the generalized forecast error. This forms the optimality condition of the forecasts and gives an appropriate regression function corresponding to the specified loss function c (•).

To see this, consider the following two examples. First, when the loss function is the squared error loss

tmpA5-70_thumb

LOSS FUNCTIONS FOR TRANSFORMATIONS

Granger (1999) notes that it is implausible to use the same

tmpA5-71_thumbtmpA5-72_thumb

It is easy to see that the optimality condition for f 1 does not imply the optimality condition for h* 1 in general. Under some strong conditions on the functional forms of the transformation h(-) and of the two loss functions c^-), c2(-), the above two conditions may coincide. Granger (1999) remarks that it would be strange behavior to use the same loss function for Yand h (Y). This awaits further analysis in future research.

LOSS FUNCTIONS FOR ASYMMETRY

The most prevalent loss function for the evaluation of a forecast is the symmetric quadratic function. Negative and positive forecast errors of the same magnitude have the same loss. This functional form is assumed because mathematically it is very tractable, but from an economic point of view, it is not very realistic. For a given information set and under a quadratic loss, the optimal forecast is the conditional mean of the variable under study. The choice of the loss function is fundamental to the construction of an optimal forecast. For asymmetric loss functions, the optimal forecast can be more complicated as it will depend not only on the choice of the loss function but also on the characteristics of the probability density function of the forecast error (Granger 1999).

As Granger (1999) notes, the overwhelming majority of forecast work uses the cost function c (e) = ae2, a > 0, largely for mathematical convenience. Asymmetric loss function is often relevant. A few examples from Granger (1999) follow. The cost of arriving ten minutes early at the airport is quite different from arriving ten minutes late. The cost of having a computer that is 10 percent too small for a task is different from the computer being 10 percent too big. The loss of booking a lecture room that has ten seats too many for your class is different from that of a room that has ten seats too few. In dam construction, an underestimate of the peak water level is usually much more serious than an overestimate (Zellner 1986).

tmpA5-73_thumb

A particularly interesting asymmetric loss is the linex function of Hal Varian (1975), which takes the form

tmpA5-74_thumbtmpA5-75_thumbtmpA5-76_thumb

which is exponential for all values of e (Granger 1999). When a = [, it becomes the symmetric double linex loss function.

LOSS FUNCTIONS FOR FORECASTING FINANCIAL RETURNS

Some simple examples of the loss function for evaluating the point forecasts of financial returns are the out-of-sam-ple mean of the following loss functions studied in Yongmiao Hong and Tae-Hwy Lee (2003): the squared

tmpA5-77_thumb

1(-) takes the value of 1 if the statement in the parentheses is true and 0 otherwise. The negative signs in the latter two are to make them the loss to minimize (rather than to maximize). The out-of-sample mean of these loss functions are the mean squared forecast errors (MSFE), mean absolute forecast errors (MAFE), mean forecast trading returns (MFTR), and mean correct forecast directions (MCFD):

tmpA5-78_thumb

These loss functions may further incorporate issues such as interest differentials, transaction costs, and market depth. Because the investors are ultimately trying to maximize profits rather than minimize forecast errors, MSFE and MAFE may not be the most appropriate evaluation criteria. Granger (1999) emphasizes the importance of model evaluation using economic measures such as MFTR rather than statistical criteria such as MSFE and MAFE. Note that MFTR for the buy-and-hold trading strategy with sign (ft, 1) = 1 is the unconditional mean

tmpA5-79_thumb

is closely associated with an economic measure as it relates to market timing. Mutual fund managers, for example, can adjust investment portfolios in a timely manner if they can predict the directions of changes, thus earning a return higher than the market average.

LOSS FUNCTIONS FOR ESTIMATION AND EVALUATION

When the forecast is based on an econometric model, to the construction of the forecast, a model needs to be estimated. Inconsistent choices of loss functions in estimation and forecasting are often observed. We may choose a symmetric quadratic objective function to estimate the parameters of the model, but the evaluation of the model-based forecast may be based on an asymmetric loss function. This logical inconsistency is not inconsequential for tests assessing the predictive ability of the forecasts. The error introduced by parameter estimation affects the uncertainty of the forecast and, consequently, any test based on it.

However, in applications, it is often the case that the loss function used for estimation of a model is different from the one(s) used in the evaluation of the model. This logical inconsistency can have significant consequences with regard to comparison of predictive ability of competing models. The uncertainty associated with parameter estimation may result in invalid inference of predictive ability (West 1996). When the objective function in estimation is the same as the loss function in forecasting, the effect of parameter estimation vanishes. If one believes that a particular criteria should be used to evaluate forecasts, then it may also be used at the estimation stage of the modeling process. Gloria Gonzalez-Rivera, Tae-Hwy Lee, and Emre Yoldas (2007) show this in the context of the VaR model of RiskMetrics, which provides a set of tools to measure market risk and eventually forecast the value-at-risk (VaR) of a portfolio of financial assets. A VaR is a quantile return. RiskMetrics offers a prime example in which the loss function of the forecaster is very well defined. They point out that a VaR is a quantile, and thus the check loss function can be the objective function to estimate the parameters of the RiskMetrics model.

LOSS FUNCTION FOR BINARY FORECAST AND MAXIMUM SCORE

tmpA5-80_thumbtmpA5-81_thumbtmpA5-83_thumbtmpA5-84_thumb[1]

LOSS FUNCTIONS FOR PROBABILITY FORECASTS

Francis Diebold and Glenn Rudebusch (1989) consider the probability forecasts for business-cycle turning points. To measure the accuracy of predicted probabilities, that is, the average distance between the predicted probabilities and observed realization (as measured by a zero-one dummy variable). Suppose we have time series of P prob-

tmpA5-85_thumb

corresponding realization with df = 1 if a business-cycle turning point (or any defined event) occurs in period t and df = 0 otherwise. The loss function analogous to the squared error is Brier’s score based on the quadratic probability score (QPS):

tmpA5-86_thumb

The QPS ranges from 0 to 2, with 0 for perfect accuracy. As noted by Diebold and Rudebusch (1989), the use of the symmetric loss function may not be appropriate, as a forecaster may be penalized more heavily for missing a call (making a Type II error) than for signaling a false alarm (making a Type I error). Another loss function is given by the log probability score (LPS)

tmpA5-87_thumb

which is similar to the loss for the interval forecast. Major mistakes are penalized more heavily under LPS than under QPS. Further loss functions are discussed in Diebold and Rudebusch (1989).

Another loss function useful in this context is the Kuipers score (KS), which is defined by KS = Hit Rate – False Alarm Rate, where the hit rate is the fraction of the bad events that were correctly predicted as good events (power, or 1 — probability of Type II error), and the false alarm rate is the fraction of good events that have been incorrectly predicted as bad events (probability of Type I error).

LOSS FUNCTION FOR INTERVAL FORECASTS

Suppose Y is a stationary series. Let the one-period-ahead conditional interval forecast made at time t from a model be denoted as

tmpA5-88_thumb

Hence, we can choose a model for interval forecasts with the smallest out-of-sample mean of the negative predictive log-likelihood defined by

tmpA5-89_thumb

LOSS FUNCTION FOR DENSITY FORECASTS

tmpA5-90_thumb[1]

In practice, it is rarely the case that we can find an optimal model. As it is very likely that "the true distribution is in fact too complicated to be represented by a simple mathematical function" (Sawa 1978), all the models proposed by different researchers can be possibly misspec-ified and thereby we regard each model as an approximation of the truth. Our task is then to investigate which density forecast model can approximate the true conditional density most closely. We have to first define a metric to measure the distance of a given model to the truth, and then compare different models in terms of this distance.

The adequacy of a density forecast model can be measured by the conditional Kullback-Leibler information criterion (KLIC) (1951) divergence measure between two conditional densities,

tmpA5-91_thumb[1]tmpA5-92_thumb

which is the ratio of the two predictive log-likelihood functions. With treating model 1 as a benchmark model (for model selection) or as the model under the null

tmpA5-93_thumb

can be considered as a loss function to minimize. To sum up, the KLIC differential can serve as a loss function for density forecast evaluation as discussed in Yong Bao, Tae-Hwy Lee, and Burak Saltoglu (2007).

LOSS FUNCTIONS FOR VOLATILITY FORECASTS

Gloria Gonzalez-Rivera, Tae-Hwy Lee, and Santosh Mishra (2004) analyze the predictive performance of various volatility models for stock returns. To compare the performance, they choose loss functions for which volatility estimation is of paramount importance. They deal with two economic loss functions (an option pricing function and a utility function) and two statistical loss functions (the check loss for a value-at-risk calculation and a predictive likelihood function of the conditional variance).

LOSS FUNCTIONS FOR TESTING GRANGER-CAUSALITY

In time series forecasting, a concept of causality is due to Granger (1969), who defined it in terms of conditional distribution. Tae-Hwy Lee and Weiping Yang (2007) use loss functions to test for Granger-causality in conditional mean, in conditional distribution, and in conditional quantiles. The causal relationship between money and income (output) has been an important topic that has been extensively studied. However, those empirical studies are almost entirely on Granger-causality in the conditional mean. Compared to conditional mean, conditional quan-tiles give a broader picture of a variable in various scenarios. Lee and Yang (2007) explore whether forecasting the conditional quantile of output growth may be improved using money. They compare the check (tick) loss functions of the quantile forecasts of output growth with and without using the past information on money growth, and assess the statistical significance of the loss-differential of the unconditional and conditional predictive abilities. As conditional quantiles can be inverted to the conditional distribution, they also test for Granger-causality in the conditional distribution (using a nonparametric copula function). Using U.S. monthly series of real personal income and industrial production for income, and M1 and M2 for money, for 1959 to 2001, they find that out-of-sample quantile forecasting for output growth, particularly in tails, is significantly improved by accounting for money. On the other hand, money-income Granger-causality in the conditional mean is quite weak and unstable. Their results have important implications for monetary policy, showing that the effectiveness of monetary policy has been underestimated by merely testing Granger-causality in mean. Money-income Granger-causality is stronger than it has been known, and therefore the information on money growth can (and should) be more widely utilized in implementing monetary policy.

Next post:

Previous post: