Databases Reference
In-Depth Information
user that posed the keyword query. For that reason, many evaluations of matching
or mapping tools are performed by human experts.
Another difficulty faced during the design of evaluation techniques for mapping
tools is the lack of a clear specification of the input language, i.e., a standardized for-
malism with well-defined semantics. In contrast to benchmarks for relational [ Trans-
action Processing Performance Council 2001 ] and XML systems [ Bohme and Rahm
2001 ] that could leverage from the respective SQL and XQuery standard query
languages, it is still not clear how to describe a scenario. Formally describing the
schemas is not an issue, but describing the intended transformation, i.e., the input
that the designer needs to provide, is. The best way to unambiguously specify the
intended transformation is through a transformation language script, or a mapping
in some formalism, but there are two main issues with this option. First, there are
no guarantees that the mapping tool will be able to accept the specific formalism as
input, or at least that there will be an unambiguous translation of the input from the
formalism into the input language supported by the mapping tool. The second issue
is that such an approach beats the purpose of a mapping tool, which is intended
to shield the mapping designer from the complexity and the peculiarities of the
transformation language. It is actually for that reason that mapping tool developers
have opted for simpler, higher-level specification languages, such as visual objects,
direct lines between schema elements, or the output of the matching process in gen-
eral. Unfortunately, such specification is by nature ambiguous. Consider one of the
already identified [ Alexe et al. 2008c ] ambiguous situations, described in Fig. 9.4 .It
is a simple scenario in which the mapping designer needs to copy the company data
from the source into organizations data in the target. To specify this, the designer
draws the two interschema lines illustrated in Fig. 9.4 . When these are fed to a pop-
ular commercial mapping tool, the tool generates a transformation script, which
generates the target instance illustrated in Fig. 9.5 a when executed on the instance
of Fig. 9.4 . A different tool, for the same input, produces a transformation script
that generates the instance illustrated in Fig. 9.5 b. A third one produces a script
that generates the instance of Fig. 9.5 c, which is most likely the one the mapping
designer had in mind to create. These differences are not an error from the side of
the tools, rather a consequence of the fact that in the absence of a global agreement
on the semantics of the matches, or the input language in general, different tools may
interpret them differently and may require different inputs for generating the same
mappings. In the above example, the tool that generated the instance in Fig. 9.5 a
could have also produced the instance of Fig. 9.5 c, if the designer had provided one
more match from the element Company to the element Organization . This match
(which is between nonleaf elements) is not allowed at all in the tool that created
the instance of Fig. 9.5 c. The issue is also highly related to the level of intelligence
and reasoning capabilities that the tools are offering. Some tools may require a min-
imum input from the user, and through advanced reasoning they may be able to
generate the intended mappings [ Bonifati et al. 2008b ; Fagin et al. 2009a ]. Others
may require the designer to be more explicit when describing the transformation she
hasinmindtocreate[ Altova 2008 ; Stylus Studio 2005 ]. Even by considering only
Search WWH ::




Custom Search