Recent Advances of Exception Mining in Stock Market - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

structures, leaves represent classifications and

branches represent conjunctions of features that

lead to those classifications (Utgoff 2004). The

machine learning technique for inducing a deci-

sion tree from data is called decision tree learning.

A well-known decision tree algorithm is C4.5

(Quinlan 1993).

Logistic regression is a model used to predict

the probability of occurrence of an event by fit-

ting data to a logistic curve (Hosmer & Stanley

2000). It makes use of several predictor variables

that may be either numerical or categorical. For

example, the probability that a person has a heart

attack within a specified time period might be

predicted from knowledge of the person's age, sex

and body mass index. Logistic regression is used

extensively in the medical and social sciences as

well as marketing applications, such as prediction

of a customer's propensity to purchase a product

or cease a subscription.

Neural network (NN) is a network of artificial

neurons that uses a mathematical or computational

model for information processing (Muller & In-

sua 1995). In most cases, a neural network is an

adaptive system that changes its structure based

on external or internal information that flows

through the network.

Donoho (2003) researches on the solution of

early detection of insider trading by using data

mining technologies. His research was inspired

by McMillian's hypothesis that people with inside

information leave evidence in option trading data

that might predict news. In order to automate the

analysis and discover unknown relationships, he

made use of different data mining technologies

to replace the large amount of human intuition

and manual analysis in McMillian's method. The

utilized technologies include C4.5, backwards

stepwise logic regression and neural networks.

The experimental data in the research came from

three sources: option trading, stock trading, and

news. Stock and option data were available on

all U.S. companies for which options are trades

(about 2160 companies). News covered these

companies plus others. The date range for which

all three data sources were available covered a six-

month time period from March 11, 2003 to Sept

17, 2003. An expert model was used in order to

evaluate the results.All three algorithms produced

lift over random and over the expert model, but no

algorithm clearly outperformed the others.

Outlier Mining On Multiple

tiMe SerieS in StOck Market

From the literature review, we can see that most

of the exceptions detection technologies handle

a single time series. It will be beneficial if we

could integrate multiple time series, such as price,

index, trade amount, etc. This is the motivation

of our research on outlier mining on multiple

time series (OMM). In this section, the design

of OMM, the experiments and the evaluation of

OMM are illustrated.

Outlier Mining on Multiple

time Series (OMM)

The idea of OMM is motivated to improve the

accuracy of stock market surveillance. In Shan-

non's information theory, information is defined

as that which removes or reduces uncertainty

(Cover & Thomas 1991). For outlier detection

task, more information means higher accuracy

of an outlier detection model, since the identified

outliers are more likely to be different from the

remaining data. For example, it is less accurate to

measure a stock and identify the outliers by using

price information only. The results will be more

reasonable if we add one or more measures, such

as volume, volatility and liquidity.

In the design of OMM, the key issue is how to

integrate multiple time series. There are two main

potential approaches for this. One is to integrate

the multiple time series before the outlier mining

process, and the other is to run outlier mining on

individual time series first and then integrate the

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home