Classification: Advanced Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

principle (Lam [Lam98]). Cooper [Coo90] showed that the general problem of infer-

ence in unconstrained belief networks is NP-hard. Limitations of belief networks, such

as their large computational complexity (Laskey and Mahoney [LM97]), have prompted

the exploration of hierarchical and composable Bayesian models (Pfeffer, Koller, Milch,

and Takusagawa [PKMT99] and Xiang, Olesen, and Jensen [XOJ00]). These follow an

object-oriented approach to knowledge representation. Fishelson and Geiger [FG02]

present a Bayesian network for genetic linkage analysis.

The perceptron is a simple neural network, proposed in 1958 by Rosenblatt [Ros58],

which became a landmark in early machine learning history. Its input units are ran-

domly connected to a single layer of output linear threshold units. In 1969, Minsky

and Papert [MP69] showed that perceptrons are incapable of learning concepts that

are linearly inseparable. This limitation, as well as limitations on hardware at the time,

dampened enthusiasm for research in computational neuronal modeling for nearly

20 years. Renewed interest was sparked following the presentation of the backpropaga-

tion algorithm in 1986 by Rumelhart, Hinton, and Williams [RHW86], as this algorithm

can learn concepts that are linearly inseparable.

Since then, many variations of backpropagation have been proposed, involving, for

example, alternative error functions (Hanson and Burr [HB87]); dynamic adjustment

of the network topology (M ezard and Nadal [MN89]; Fahlman and Lebiere [FL90]; Le

Cun, Denker, and Solla [LDS90]; and Harp, Samad, and Guha [HSG90]); and dynamic

adjustment of the learning rate and momentum parameters (Jacobs [Jac88]). Other

variations are discussed in Chauvin and Rumelhart [CR95]. Books on neural networks

include Rumelhart and McClelland [RM86]; Hecht-Nielsen [HN90]; Hertz, Krogh, and

Palmer [HKP91]; Chauvin and Rumelhart [CR95]; Bishop [Bis95]; Ripley [Rip96]; and

Haykin [Hay99]. Many topics on machine learning, such as Mitchell [Mit97] and Russell

and Norvig [RN95], also contain good explanations of the backpropagation algorithm.

There are several techniques for extracting rules from neural networks, such as those

found in these papers: [SN88, Gal93, TS93, Avn95, LSL95, CS96, LGT97]. The method

of rule extraction described in Section 9.2.4 is based on Lu, Setiono, and Liu [LSL95].

Critiques of techniques for rule extraction from neural networks can be found in Craven

and Shavlik [CS97]. Roy [Roy00] proposes that the theoretical foundations of neural

networks are flawed with respect to assumptions made regarding how connectionist

learning models the brain. An extensive survey of applications of neural networks in

industry, business, and science is provided in Widrow, Rumelhart, and Lehr [WRL94].

Support Vector Machines (SVMs) grew out of early work by Vapnik and

Chervonenkis on statistical learning theory [VC71]. The first paper on SVMs was

presented by Boser, Guyon, and Vapnik [BGV92]. More detailed accounts can be

found in topics by Vapnik [Vap95, Vap98]. Good starting points include the tuto-

rial on SVMs by Burges [Bur98], as well as textbook coverage by Haykin [Hay08],

Kecman [Kec01], and Cristianini and Shawe-Taylor [CS-T00]. For methods for solving

optimization problems, see Fletcher [Fle87] and Nocedal and Wright [NW99]. These

references give additional details alluded to as “fancy math tricks” in our text, such

as transformation of the problem to a Lagrangian formulation and subsequent solving

using Karush-Kuhn-Tucker (KKT) conditions.

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home