Recent Advances of Exception Mining in Stock Market - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

outlier Identification

Outlier Detection in GARCH Models

Outliers refer to the data points which are grossly

different from or inconsistent with the rest of

data (Han & Kamber 2001). The usual strategy

for outlier mining is to find a model that aims at

maximally capturing the information of the nor-

mal data and take samples inconsistent with the

model as outliers. Based on the above strategy,

numerous successful outlier mining models have

been proposed, which can be further categorized

into four approaches: the statistical approach, the

distance-based approach, the deviation-based ap-

proach, and the density-based approach.

Generalized Autoregressive Conditional Het-

eroskedasticity (GARCH) model was introduced

by Bollerslev (1986). It is an econometric model

for modeling and forecasting time-dependent vari-

ance, and hence volatility, of stock price returns.

It represents current variance in terms of past

variances. The parameters in the model are usually

determined by Maximum Likelihood Estimation

applied to the likelihood function.

The GARCH model is typically called the

GARCH (1, 1) model. The (1, 1) in parentheses is

a standard notation in which the first number refers

to how many autoregressive lags, or (Autoregres-

sive Conditional Heteroscedasticity)ARCH terms

(Gourieroux 1997), appear in the equation, while

the second number refers to how many moving

average lags are specified, which here is often

called the number of GARCH terms. Sometimes

models with more than one lag are needed to find

good variance forecasts.

The GARCH model is a popular approach

to abnormal return detection. Franses and Dijk

(2000) researched on this issue and adapted the

outlier detection method proposed by Chen and

Liu (1993). The critical values for the relevant

test statistic were generated, and their methods

were evaluated in an extensive simulation study.

This outlier detection and correction method

was applied to 10 years of weekly return from

1986 to 1995 on the stock markets of Amster-

dam, Frankfurt, Paris, Hong Kong, Singapore

and New York, which amounts to approximately

500 observations. Franses and Dijk (2000) used

weekly data from 1996 to 1998 to evaluate the

out-of-sample forecast performance of conditional

volatility with GARCH (1, 1) models estimated

on the series before and after outlier correction.

The result shows that correcting for a few outliers

yields substantial improvements in out-of-sample

forecasts.

Outlier Test

Dixon (1950) firstly introduced his ratio R to test

outliers from a sample. It is proved to be robust and

applicable to any distribution (Chernick 1982). In

the Dixon Ratio Test, the range of the test values is

calculated, and the results are utilized to measure

variation of the stock markets. The Dixon Ration

R is calculated by the difference between the two

highest values and the range of all samples. Let H 1

be the highest value and H 2 be the second highest

value. Let LV be the lowest value.

R = ( H 1 - H 2 ) / ( H 1 - LV ) (1)

The closer the value of R is to one, the more

likely that the highest value is from another

distribution and an outlier to the current set of

values.

The Dixon Ration Test is used to detect the

extremely deviated data set from the rest of the

data. However, it fails to identify outliers where all

the top k highest values are outliers. Therefore, in

our research, we used the modified Dixon Ration

to test outliers. We define LA be the average value

of all values except the highest values by replacing

the H 2 in formula 1 with LA (Luo et al. 2008). Our

Test Ratio is calculated as following:

R = ( H 1 - LA ) / ( H 1 - LV ) (2)

This adjustment makes the R fit to measure

the multiple outliers.

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home