Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

12

Large-Scale Machine Learning

Many algorithms are today classified as “machine learning.” These algorithms share, with

the other algorithms studied in this topic, the goal of extracting information from data. All

algorithms for analysis of data are designed to produce a useful summary of the data, from

which decisions are made. Among many examples, the frequent-itemset analysis that we did

in Chapter 6 produces information like association rules, which can then be used for plan-

ning a sales strategy or for many other purposes.

However, algorithms called “machine learning” not only summarize our data; they are

perceived as learning a model or classifier from the data, and thus discover something about

data that will be seen in the future. For instance, the clustering algorithms discussed in

Chapter 7 produce clusters that not only tell us something about the data being analyzed

(the training set), but they allow us to classify future data into one of the clusters that result

from the clustering algorithm. Thus, machine-learning enthusiasts often speak of clustering

with the neologism “unsupervised learning”; the term unsupervised refers to the fact that the

input data does not tell the clustering algorithm what the clusters should be. In supervised

machine learning, which is the subject of this chapter, the available data includes informa-

tion about the correct way to classify at least some of the data. The data already classified is

called the training set .

In this chapter, we do not attempt to cover all the different approaches to machine learn-

ing. We concentrate on methods that are suitable for very large data and that have the poten-

tial for parallel implementation. We consider the classical “perceptron” approach to learning

a data classifier, where a hyperplane that separates two classes is sought. Then, we look at

more modern techniques involving support-vector machines. Similar to perceptrons, these

methods look for hyperplanes that best divide the classes, so that few, if any, members of the

training set lie close to the hyperplane. We end with a discussion of nearest-neighbor tech-

niques, where data is classified according to the class(es) of their nearest neighbors in some

space.

Search WWH ::

Custom Search

Home