Databases Reference
In-Depth Information
3-D spatial structures of genomes may coexist for certain biological objects. Mining
multiple data sources of complex data often leads to fruitful findings due to the mutual
enhancement and consolidation of such multiple sources. On the other hand, it is also
challenging because of the difficulties in data cleaning and data integration, as well as
the complex interactions among the multiple sources of such data.
While such data require sophisticated facilities for efficient storage, retrieval, and
updating, they also provide fertile ground and raise challenging research and imple-
mentation issues for data mining. Data mining on such data is an advanced topic. The
methods involved are extensions of the basic techniques presented in this topic.
1.4 What Kinds of Patterns Can Be Mined?
We have observed various types of data and information repositories on which data
mining can be performed. Let us now examine the kinds of patterns that can be mined.
There are a number of dataminingfunctionalities . These include characterization
and discrimination (Section 1.4.1); the mining of frequent patterns, associations, and
correlations (Section 1.4.2); classification and regression (Section 1.4.3); clustering anal-
ysis (Section 1.4.4); and outlier analysis (Section 1.4.5). Data mining functionalities are
used to specify the kinds of patterns to be found in data mining tasks. In general, such
tasks can be classified into two categories: descriptive and predictive . Descriptive min-
ing tasks characterize properties of the data in a target data set. Predictive mining tasks
perform induction on the current data in order to make predictions.
Data mining functionalities, and the kinds of patterns they can discover, are described
below. In addition, Section 1.4.6 looks at what makes a pattern interesting. Interesting
patterns represent knowledge .
1.4.1 Class/Concept Description: Characterization
and Discrimination
Data entries can be associated with classes or concepts. For example, in the AllElectronics
store, classes of items for sale include computers and printers , and concepts of customers
include bigSpenders and budgetSpenders . It can be useful to describe individual classes
and concepts in summarized, concise, and yet precise terms. Such descriptions of a class
or a concept are called class/concept descriptions . These descriptions can be derived
using (1) data characterization , by summarizing the data of the class under study (often
called the target class ) in general terms, or (2) data discrimination , by comparison of
the target class with one or a set of comparative classes (often called the contrasting
classes ), or (3) both data characterization and discrimination.
Data characterization is a summarization of the general characteristics or features
of a target class of data. The data corresponding to the user-specified class are typically
collected by a query. For example, to study the characteristics of software products with
sales that increased by 10% in the previous year, the data related to such products can
be collected by executing an SQL query on the sales database.
 
Search WWH ::




Custom Search