Information Technology Reference
In-Depth Information
possible existence constraints. However, functional dependencies mining is far
less trivial.
Back in 1995, Ram presented four types of approaches to derive functional
dependencies from an existing conceptual ER schema [5]. The first category con-
sists in using keyword analysis to identify intra-entity functional dependencies:
typically, attributes bearing a su x or prefix such as “id” or “number” should be
considered potential determinants, while attributes bearing a sux or prefix such
as “maximum”, “minimum”, “average” or “total” should be considered potential
dependent attributes. The second category consists in analysing the cardinalities
of binary relationships to identity inter-entity functional dependencies, typically
between their identifiers. The third category is similar, but concerns N-ary rela-
tionships. And finally, the fourth category consists in analysing sample data to
elicit undiscovered functional dependencies. These principles were supported by
the FDExpert tool.
The first three categories rely on the analysis of the schema itself, while the
latter category, known as the dependency discovery problem , focuses on the con-
tent of the database. The latter category is a standard issue, especially in data
mining, database archiving, data warehouses and Online Analytical Processing
(OLAP). The most prominent existing algorithms dealing with this issue can be
classified in three categories, that are dicult to compare qualitatively [6,7].
The first two categories basically try to explore the search space (i.e. the pos-
sible combinations of the attributes for a given relation) in the most ecient way
possible, in order to test the associated functional dependencies using a stripped
partition database computed from that relation. The candidate generate-and-test
approach progressively explores and prunes the search space in a levelwise man-
ner, while partitioning the database using attribute-based equivalence classes, as
in Huhtala et al.'s TANE [8], Novelli and Cicchetti's FUN [9], or Yao and Hamil-
ton's FD Mine [7]. The minimal cover approach structures the search space using
hypergraphs that are explored to discover the minimal cover of the set of FDs
for a given database, i.e. the minimal set of FDs from which the entire set of
FDs can be generated using the Armstrong axioms, as in DepMiner, proposed
by Lopes et al. [10] and FastFDs, proposed by Wyss et al. [11].
Finally, Formal concept analysis (FCA) has also been used recently to find
and represent logical implications in datasets [12], mainly through a closure op-
erator from which concepts (closed sets) can be derived. For instance, Baixeries
uses Galois connections and concept lattices as a framework to find functional
dependencies [13], while Rancz et al. optimise an existing method introduced
by [14], which provides a direct translation from relational databases into the
language of power context families, in order to build inverted index files to opti-
mise the elicitation the functional dependencies in a relational table through the
construction of their formal context [15]. The latter authors also developed the
subsequent FCAFuncDepMine software to detect functional dependencies in re-
lational database tables [16]. Similar principles were also used in Flory's method,
which was based on the definition and analysis of a matrix and its associated
graph of functional dependencies [17].
 
Search WWH ::




Custom Search