Databases Reference
In-Depth Information
comparison between the target and contrasting classes. The user can adjust the com-
parison description by applying drill-down, roll-up, and other OLAP operations to
the target and contrasting classes, as desired.
The preceding discussion outlines a general algorithm for mining comparisons
in databases. In comparison with characterization, the previous algorithm involves
synchronous generalization of the target class with the contrasting classes, so that classes
are simultaneously compared at the same abstraction levels.
Example 4.14 mines a class comparison describing the graduate and undergraduate
students at
Big University
.
Example 4.14
Mining a class comparison.
Suppose that you would like to compare the general pro-
perties of the graduate and undergraduate students at
Big University
, given the attributes
name, gender, major, birth place, birth date, residence, phone#
, and
gpa
.
This data mining task can be expressed in DMQL as follows:
use
Big University DB
mine comparison as
“grad vs undergrad students”
in relevance to
name
,
gender
,
major
,
birth place
,
birth date
,
residence
,
phone#
,
gpa
for
“graduate students”
where status in
“graduate”
versus
“undergraduate students”
where status in
“undergraduate”
analyze count%
from
student
Let's see how this typical example of a data mining query for mining comparison
descriptions can be processed.
First, the query is transformed into two relational queries that collect two sets of task-
relevant data: one for the
initial target-class working relation
and the other for the
initial
contrasting-class working relation
, as shown in Tables 4.8 and 4.9. This can also be viewed
as the construction of a data cube, where the status fgraduate, undergraduateg serves as
one dimension, and the other attributes form the remaining dimensions.
Second, dimension relevance analysis can be performed, when necessary, on the two
classes of data. After this analysis, irrelevant or weakly relevant dimensions (e.g.,
name,
gender, birth place, residence
, and
phone#
) are removed from the resulting classes. Only
the highly relevant attributes are included in the subsequent analysis.
Third, synchronous generalization is performed on the target class to the levels con-
trolled by user- or expert-specified dimension thresholds, forming the
prime target class
relation
. The contrasting class is generalized to the same levels as those in the prime
target class relation, forming the
prime contrasting class(es) relation
, as presented in
Tables 4.10 and 4.11. In comparison with undergraduate students, graduate students
tend to be older and have a higher GPA in general.