Interactively Eliciting Database Constraints and Dependencies - Advanced Information Systems Engineering

Information Technology Reference

In-Depth Information

possible existence constraints. However, functional dependencies mining is far

less trivial.

Back in 1995, Ram presented four types of approaches to derive functional

dependencies from an existing conceptual ER schema [5]. The first category con-

sists in using keyword analysis to identify intra-entity functional dependencies:

typically, attributes bearing a su x or prefix such as “id” or “number” should be

considered potential determinants, while attributes bearing a sux or prefix such

as “maximum”, “minimum”, “average” or “total” should be considered potential

dependent attributes. The second category consists in analysing the cardinalities

of binary relationships to identity inter-entity functional dependencies, typically

between their identifiers. The third category is similar, but concerns N-ary rela-

tionships. And finally, the fourth category consists in analysing sample data to

elicit undiscovered functional dependencies. These principles were supported by

the FDExpert tool.

The first three categories rely on the analysis of the schema itself, while the

latter category, known as the dependency discovery problem , focuses on the con-

tent of the database. The latter category is a standard issue, especially in data

mining, database archiving, data warehouses and Online Analytical Processing

(OLAP). The most prominent existing algorithms dealing with this issue can be

classified in three categories, that are dicult to compare qualitatively [6,7].

The first two categories basically try to explore the search space (i.e. the pos-

sible combinations of the attributes for a given relation) in the most ecient way

possible, in order to test the associated functional dependencies using a stripped

partition database computed from that relation. The candidate generate-and-test

approach progressively explores and prunes the search space in a levelwise man-

ner, while partitioning the database using attribute-based equivalence classes, as

in Huhtala et al.'s TANE [8], Novelli and Cicchetti's FUN [9], or Yao and Hamil-

ton's FD Mine [7]. The minimal cover approach structures the search space using

hypergraphs that are explored to discover the minimal cover of the set of FDs

for a given database, i.e. the minimal set of FDs from which the entire set of

FDs can be generated using the Armstrong axioms, as in DepMiner, proposed

by Lopes et al. [10] and FastFDs, proposed by Wyss et al. [11].

Finally, Formal concept analysis (FCA) has also been used recently to find

and represent logical implications in datasets [12], mainly through a closure op-

erator from which concepts (closed sets) can be derived. For instance, Baixeries

uses Galois connections and concept lattices as a framework to find functional

dependencies [13], while Rancz et al. optimise an existing method introduced

by [14], which provides a direct translation from relational databases into the

language of power context families, in order to build inverted index files to opti-

mise the elicitation the functional dependencies in a relational table through the

construction of their formal context [15]. The latter authors also developed the

subsequent FCAFuncDepMine software to detect functional dependencies in re-

lational database tables [16]. Similar principles were also used in Flory's method,

which was based on the definition and analysis of a matrix and its associated

graph of functional dependencies [17].

Advanced Information Systems Engineering

Search WWH ::

Custom Search

Home