Database Reference
In-Depth Information
Figure 3. Examples of fuzzy sets
The Fuzzy Set Theory
values from the ontology or (iii) in the queries, in
order to represent fuzzy selection criteria which
express the preferences of the end-user.
We propose to use the fuzzy set theory to repre-
sent imprecise data. In this chapter, we use the
representation of fuzzy sets proposed in (Zadeh,
1965).
Definition A fuzzy set f on a definition do-
main D(f) is defined by a membership function
µ from D(f) to [0,1] that associates the degree to
which x belongs to f with each element x of D(f).
We call support of f the subset of D(f) such that
support(f)= {a ∈ D(f) | µ f (a) > 0}. We call kernel
of f the subset of D(f) such that kernel(f) = {a ∈
D(f) | µ f (a) = 1}.
We distinguish two kinds of fuzzy sets: (i) dis-
crete fuzzy sets and (ii) continuous fuzzy sets.
Definition A discrete fuzzy set f is a fuzzy set
associated with a symbolic type of the ontology.
Its definition domain is the type hierarchy.
Definition A continuous fuzzy set f is a trap-
ezoidal fuzzy set associated with a numeric type
of the ontology. A trapezoidal fuzzy set is defined
by its four characteristic points which correspond
to min(support(f)), min(kernel(f)), max(kernel(f))
and max(support(f)). Its definition domain is the
interval of possible values of the type.
The fuzzy set formalism can be used in three dif-
ferent ways as defined in (Dubois & Prade, 1988):
(i) in the database, in order to represent imprecise
data as an ordered disjunction of exclusive pos-
sible values, (ii) in the database as a result of the
semantic annotation process, in order to represent
the similarity between a value from the web and
Example 2
The fuzzy set ContaminationLevel_FS of Figure
3 is a continuous fuzzy set denoted [4,5,6,7]. It
represents the possible values of a level of con-
tamination. The fuzzy set FoodProduct_Similarity
is a discrete fuzzy set denoted (0.66/rice + 0.5/rice
flour). It represents the set of terms of the ontology
that are similar with different degrees to the term
Basmati rice found in a document retrieved from
the Web. The fuzzy set FoodProduct_Preferences
is a discrete one denoted (1/rice + 0.5/cereal). Used
in a query, it means that the end-user is interested
by information about rice but also with a lowest
interest about cereal.
The Semantic Annotation Process
In order to deal with the data rarity problem
of the CONTA local database, we propose to
extend the local database with data extracted
from the Web. We have designed for that pur-
pose a semi-automatic acquisition tool, called
@WEB (Annotating Tables from the WEB).
This tool relies on three steps as described in
Figure 4. In the first step, relevant documents
for the application domain are retrieved using
the domain ontology thanks to crawlers and RSS
feeds. We focus on documents which contain
data tables. This may be seen as a restriction
Search WWH ::




Custom Search