Databases Reference
In-Depth Information
form a hierarchy at the schema level, a user could define some intermediate levels
manually, such as “f Alberta, Saskatchewan, Manitoba g prairies Canada ” and
“f British Columbia, prairies Canada g Western Canada .”
3. Specification of a setofattributes , but not of their partial ordering: A user may
specify a set of attributes forming a concept hierarchy, but omit to explicitly state
their partial ordering. The system can then try to automatically generate the attribute
ordering so as to construct a meaningful concept hierarchy.
“Without knowledge of data semantics, how can a hierarchical ordering for an
arbitrary set of nominal attributes be found?” Consider the observation that since
higher-level concepts generally cover several subordinate lower-level concepts, an
attribute defining a high concept level (e.g., country ) will usually contain a smaller
number of distinct values than an attribute defining a lower concept level (e.g.,
street ). Based on this observation, a concept hierarchy can be automatically gener-
ated based on the number of distinct values per attribute in the given attribute set.
The attribute with the most distinct values is placed at the lowest hierarchy level. The
lower the number of distinct values an attribute has, the higher it is in the gener-
ated concept hierarchy. This heuristic rule works well in many cases. Some local-level
swapping or adjustments may be applied by users or experts, when necessary, after
examination of the generated hierarchy.
Let's examine an example of this third method.
Example 3.7 Concept hierarchy generation based on the number of distinct values per attribute.
Suppose a user selects a set of location-oriented attributes— street, country, province
or state , and city —from the AllElectronics database, but does not specify the hierarchical
ordering among the attributes.
A concept hierarchy for location can be generated automatically, as illustrated in
Figure 3.13. First, sort the attributes in ascending order based on the number of dis-
tinct values in each attribute. This results in the following (where the number of distinct
values per attribute is shown in parentheses): country (15), province or state (365), city
(3567), and street (674,339). Second, generate the hierarchy from the top down accord-
ing to the sorted order, with the first attribute at the top level and the last attribute at the
bottom level. Finally, the user can examine the generated hierarchy, and when necessary,
modify it to reflect desired semantic relationships among the attributes. In this example,
it is obvious that there is no need to modify the generated hierarchy.
Note that this heuristic rule is not foolproof. For example, a time dimension in a
database may contain 20 distinct years, 12 distinct months, and 7 distinct days of the
week. However, this does not suggest that the time hierarchy should be “ year
<
month
<
days of the week ,” with days of the week at the top of the hierarchy.
4. Specification of only a partial set of attributes: Sometimes a user can be careless
when defining a hierarchy, or have only a vague idea about what should be included
in a hierarchy. Consequently, the user may have included only a small subset of the
Search WWH ::




Custom Search