Databases Reference
In-Depth Information
comparison between the target and contrasting classes. The user can adjust the com-
parison description by applying drill-down, roll-up, and other OLAP operations to
the target and contrasting classes, as desired.
The preceding discussion outlines a general algorithm for mining comparisons
in databases. In comparison with characterization, the previous algorithm involves
synchronous generalization of the target class with the contrasting classes, so that classes
are simultaneously compared at the same abstraction levels.
Example 4.14 mines a class comparison describing the graduate and undergraduate
students at Big University .
Example 4.14 Mining a class comparison. Suppose that you would like to compare the general pro-
perties of the graduate and undergraduate students at Big University , given the attributes
name, gender, major, birth place, birth date, residence, phone# , and gpa .
This data mining task can be expressed in DMQL as follows:
use Big University DB
mine comparison as “grad vs undergrad students”
in relevance to name , gender , major , birth place , birth date , residence ,
phone# , gpa
for “graduate students”
where status in “graduate”
versus “undergraduate students”
where status in “undergraduate”
analyze count%
from student
Let's see how this typical example of a data mining query for mining comparison
descriptions can be processed.
First, the query is transformed into two relational queries that collect two sets of task-
relevant data: one for the initial target-class working relation and the other for the initial
contrasting-class working relation , as shown in Tables 4.8 and 4.9. This can also be viewed
as the construction of a data cube, where the status fgraduate, undergraduateg serves as
one dimension, and the other attributes form the remaining dimensions.
Second, dimension relevance analysis can be performed, when necessary, on the two
classes of data. After this analysis, irrelevant or weakly relevant dimensions (e.g., name,
gender, birth place, residence , and phone# ) are removed from the resulting classes. Only
the highly relevant attributes are included in the subsequent analysis.
Third, synchronous generalization is performed on the target class to the levels con-
trolled by user- or expert-specified dimension thresholds, forming the prime target class
relation . The contrasting class is generalized to the same levels as those in the prime
target class relation, forming the prime contrasting class(es) relation , as presented in
Tables 4.10 and 4.11. In comparison with undergraduate students, graduate students
tend to be older and have a higher GPA in general.
 
Search WWH ::




Custom Search