Databases Reference
In-Depth Information
categories consisting of objective measures and subjective measures [17]. Obj-
ective measures depend only on the structure of rules and the underlying
data used in the discovery process. Subjective measures also depend on the
user who examines the rules [17]. In comparison, there are limited studies on
subjective measures. For example, Silberschatz and Tuzhilin proposed a sub-
jective measure of rule interestingness based on the notion of unexpectedness
and in terms of a user belief system [17, 18].
Yao et al. [23] suggest that, the rule interestingness measures have three
forms: statistical, structural and semantic. Many measures, such as support,
confidence, independence, classification error, etc., are defined based on sta-
tistical characteristics of rules. A systematic analysis of such measures is given
by Yao et al. using a 2 × 2 contingency table induced by a rule [24, 27]. The
structural characteristics of rules have been considered in many measures. For
example, information, such as the disjunct size, attribute interestingness, the
asymmetry of classification rules, etc., can be used [4]. These measures reflect
the simplicity, easiness of understanding, or applicability of rules. Although
statistical and structural information provides an effective indicator of the
potential effectiveness of a rule, its usefulness is limited. One needs to con-
sider the semantics aspects of rules or explanations of rules [25]. Semantics
centered approaches are application and user dependent. In addition to sta-
tistical information, one incorporates other domain specific knowledge such
as user interest, utility, value, profit, actionability, and so on.
Measures defined by statistical and structural information may be viewed
as objective measures. They are user, application and domain independent.
For example, a pattern is deemed interesting if it has certain statistical prop-
erties. These measures may be useful in philosophy layer of the three-layered
framework. Different classes of rules can be identified based on statistical
characteristics, such as peculiarity rules (low support and high confidence),
exception rules (low support and high confidence, but complement to other
high support and high confidence rules), and outlier patterns (far away from
the statistical mean) [28]. Semantic based measures involve the user interpre-
tation of domain specific notions such as profit and actionability. They may
be viewed as subjective measures. Such measures are useful in the application
layer of the three-layered framework. The usefulness of rules are measured
and interpreted based on domain specific notions.
4.3 Explanation-Oriented Data Mining on the Three-Layered
Framework
To complement the extensive studies of various tasks of data mining, the ex-
planation task of data mining, more specifically, the concept of explanation-
oriented data mining, was first proposed in [26]. Some technologies of data
mining cannot immediately create knowledge or guarantee knowledge genera-
tion, but only retrieve, sort, quantify, organize and report information out of
Search WWH ::




Custom Search