Database Reference
In-Depth Information
according to three levels of risk: low, medium, and high. Regression models
can be used to predict the value of a commodity. The key difference between
both prediction and description is that prediction has a unique variable as
objective, while in descriptive problems no single variable is central to the
model. In this chapter, we focus on classification.
Pattern discovery aims at revealing either a regular behavior in a data
set or records that deviate from such a regular behavior. A typical example
of the former is the problem of finding sequential patterns inadataset.
In trac analysis, for instance, we can discover frequent routes of moving
objects like cars, trucks, and pedestrians. An example of finding irregular
behavior is to discover fraudulent credit card transactions. Another problem
related to pattern discovery arises when, given a pattern of interest, the user
wants to discover similar ones in the data set. This is used, for instance, to
find documents relevant to a set of keywords or images similar to a given one.
Typically, data mining algorithms have the following components:
￿ A model or pattern, for determining the underlying structure or functional
forms that we are looking for in the data.
￿ A score function, to assess the quality of the model.
￿ Optimization and search methods, to optimize the score function and
search over different models and patterns.
￿ Data management strategies, to handle data access eciently during
search and optimization.
In the next sections, we give a brief overview of the most important data
mining techniques and algorithms used to carry out the data mining tasks
described above. We will illustrate these using the Northwind data warehouse
given in Fig. 5.4 , to which we added two tables, depicted in Fig. 9.1 ,one
containing customer demographic data and another containing prospective
new customers. The latter will be used for prediction purposes, to forecast
the probability that a new customer places an order above a certain amount.
We explain next some of the attributes of these tables. The domain
of attribute BusinessType is the set
Minimart , Grocery , Supermarket ,
Hypermarket , Pub , Tavern , Cafe , Restaurant , Delicatessen
{
}
. The domain of
attribute OwnershipType is the set
Soletrader , Partnership , Cooperative ,
Limited liability company , Unlimited liability company , Corporation , Franchise
{
.
Attributes TotalEmployees and PermanentEmployees have the following
categorical values:
}
￿ 1: 1-19
￿ 2: 20-49
￿ 3: 50-99
￿ 4: 100-249
￿ 5: 250-499
￿ 6: 500-999
￿ 7: 1,000-2,500
￿ 8: Over 2,500
Search WWH ::




Custom Search