Discovery and Correctness of Schema Mapping Transformations - Schema Matching and Mapping

Databases Reference

In-Depth Information

equivalences [ Fagin et al. 2008 ]. These optimizations are very important in appli-

cations, in which mappings are required to be minimal, for efficiency reasons. We

discuss the recent approaches [ Gottlob et al. 2009 ; Fagin et al. 2008 ] in Sect. 6.4 .

6.1

Bridging Data and Metadata

HePToX [ Bonifati et al. 2010 , 2005 ] has been the first system to introduce data-

metadata correspondences that drive the trasformation from the schema components

in the source schema to the instance values in the target schema and vice-versa. Such

novel correspondences enrich the semantics of the transformation, while at the same

time posing new research challenges. HePToX uses a Datalog-based mapping lan-

guage called TreeLog; being an extension of SchemaLog, it is capable of handling

schema and data at par. TreeLog expressions have been inferred from arrows and

boxes between elements in the source schema and instances in the target schema

that rely on an ad-hoc graphical notation. By virtue of a bidirectional semantics for

query answering, correspondences also involving data-metadata conflicts can be tra-

versed by collecting the necessary components to answer the queries. Queries are

expressed in XQuery and the underlying data is expressed in XML to maintain the

connection with TreeLog expressions, which are intrinsically nested.

Recently, MAD (MetadatA-Data) mappings [ Hernandez et al. 2008 ] have been

studied as useful extensions in Clio [ Popa et al. 2002 ], which extend the basic map-

pings expressed as s-t tgds. Contrary to HePToX, such mappings are used for data

exchange. To this purpose, output dynamic schemas are defined, since the result of

data exchange cannot be determined a priori whenever it depends on the instances.

MAD mappings in Clio are also generated from visual specifications, similarly to

HePToX and then translated to executable trasformations. The translation algorithm

is a two-step algorithm in which the first step “shreds” the source data into views that

offer a relational partitioning of the target schema, and the second step restructures

the result of the previous step by also taking into account user-defined grouping in

target schema with nested sets.

To summarize, Clio derives a set of MAD mappings from a set of lines between a

source schema and a target schema. Applying these transformations computes a tar-

get instance that adheres to the target schema and to the correspondences. Similarly,

HePToX derives a set of TreeLog mapping rules from element correspondences (i.e.,

boxes and arrows) between two schemas. TreeLog rules are similar in spirit to s-t

tgds, although TreeLog has a second-order syntax. However, the problems solved by

Clio and HePToX are different. In Clio, the goal is data exchange, while in HePToX

turns to be query reformulation in a highly distributed setting, as we will further

discuss in Sect. 6.3 .

Search WWH ::

Custom Search

Home