Recent Advances of Exception Mining in Stock Market - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

description of v-BoMM and P-BoMM

results. In our research, we choose the latter be-

cause it keeps the original features of individual

time series. This approach also facilitates utilizing

the previous research outcomes.

Another issue is which measures to select as the

multiple time series for exception mining. For the

task of stock market surveillance, there are some

valuable experiences from financial experts which

can guide the choosing of measures. Price is the

most important measure of stock performance. We

can also use the outcomes of previous financial

research to choose measures. For example, there

are a great deal of financial research on the rela-

tionship between the abnormal behavior and the

response of stock. Meulbroek (1992) conducted

research on the relationship between insider trad-

ing, price movement and trading amounts. Their

conclusion is that there is an association between

these elements. Fishe & Robe (2002) also made a

similar conclusion. Therefore, the price movement

and trading amount are regarded as good measure-

ments for anomalies. The price movement can be

measured by price return and price fluctuation

range during one day. Price fluctuation range is

presented by the difference between the highest

price and the lowest price in one day.

Our OMM consists of two components: gen-

erators of outliers on individual time series and

integrators of multiple time series. The genera-

tors of outliers produce outliers by using existing

outlier mining technologies. Currently, we use

VOMM (Qi & Wang 2004) to carry out the task,

because it has been proved to be an effective and

efficient outlier mining technology applied in

stock market surveillance. The outliers generated

will be utilized by the integrators. The integrator

of multiple time series is to integrate the multiple

time series in order to refine the results. There are

two proposed approaches in our research. One is

based on major voting (V-BOMM) technology and

the other is based on probabilities (P-BOMM).

In order to illustrate our proposed OMM clearly,

we provide an example and demonstrate how the

V-BOMM and P-BOMM work.

Given 100 points on three time series X, Y and

Z, which are described as: [P 1 (x 1 , y 1 , z 1 ), P 2 (x 2 ,

y 2 , z 2 ),..., P 100 (x 100 , y 100 , z 100 )], where x 1 , x 2 , …,

x 100 represent the values of each points on X, y 1 ,

y 2 , …, y 100 represent the values of each points on

Y, and z 1 , z 2 , …, z 100 represent the values of each

points on Z.

First, we generate three lists of candidate

outliers on each time series by using VOMM.

The number of candidate outliers is determined

based on domain experience. Generally speak-

ing, the less the candidate outliers are, the result

is more accurate, but the coverage is worse. In

this example, we choose 3 candidate outliers on

each time series. Assume that the list of candidate

outliers obtained from time series X is [P 1 , P 3 ,

P 5 ], and the candidate outliers from Y and Z are

respectively [P 1 , P 3 , P 10 ] and [P 1 , P 5 , P 2 ].After that,

V-BOMM is used to refine the candidate outliers.

The V-BOMM produces the final outliers with

majority voting. There are 3 time series in total,

so the majority should be no less than 2. That is,

if a point appears in 2 or more lists of candidate

outliers, it will be regarded as one of the final

outliers. In the above example, P 1 , P 3 and P 5 are

the final outliers because they appear in 2 or 3 of

the above lists. On the contrary, P 10 and P 2 are not

included as the final outliers because they only

appear in one candidate list.

The P-BOMM produces the final points ranked

with the probabilities of being an outlier. First, we

generate three lists of candidate outliers on each time

series by VOMM. At the same time, an outlier test

ratio is calculated based on Formula (2). This ratio

gives the probability of being an outlier for each

point. For example, one list of candidate outliers

could be [{P 1 , 98%}, {P 3 , 92%}, {P 9 , 88%}] on

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home