Information Technology Reference
In-Depth Information
2.2 DOMAIN Definition
The DOMAIN algorithm is presented in Fig.1; in Fig. 2 we present the behavior
of the algorithm.
The input for the method is the examined set of values for a given attribute
provided as a multiset S D ; the output is the set D
of the discovered domain
values and ideally we expect D Δ D =
. The behavior of the algorithm is driven
by two input parameters: ε - the textual similarity threshold and α -themul-
tiplicity ratio threshold.
Fig. 1. The pseudocode of the
DOMAIN
method.
Fig. 2. A fragment of the graph structure after the application of the steps of the al-
gorithm. The sizes of nodes represent the number of value occurrences. In fig.A., we
created the graph with all the arcs satisfying the conditions of the relationship. In
Fig.B. we retained only the arcs representing the greatest similarity between the val-
ues. In Fig.C., the node F . has been transformed into a sink node as the relation
between A and values J,K,L did not occur. In Fig.D. we streamlined the graph and
identified values A, B, C and D as the domain values.
Search WWH ::




Custom Search