Database Reference
In-Depth Information
of sources. Suitable data extraction and transfor-
mation techniques are required. In fact, the more
heterogeneous documents are, the more complex
data integration is. The key issue for integrating
systems that are more and more heterogeneous
is to understand them. Semantic Web techniques
will have an increasing role to play in the future
in order to facilitate this understanding. Indeed,
the concept of ontology which makes possible to
add semantic information to the Web and the basic
representation languages for the Semantic Web
which allow reasoning on the content of sources are
the foundations to obtain this understanding.
Reconciliation is an important information
integration problem. It arises in other fields such
as database area when data from various sources
are combined. For example, mailing lists may
contain several entries representing the same
physical address, each entry containing different
spellings. Identifying matching records is chal-
lenging because there are no unique identifiers
across databases. Satisfactory solutions are not
available yet. In all the applications where this
problem arises, methods that are efficient while
ensuring good results and being not vulnerable to
changes of application domain are really required.
Furthermore, since sources are more and more
accessed from the Internet, additional problems
appear and have to be studied: dealing with data
freshness in order to store the freshest possible
data, dealing with trust into sources which provide
data, being capable to consider access rights when
querying the most reliable sources.
Generally speaking, automatic methods will be
of great importance in the future. Several directions
or research can be taken. Unsupervised methods
that guarantee a 100% precision of the results if
schema and data are error-free are one way to au-
tomate reconciliation. Indeed, they allow obtaining
reconciliations and non reconciliations that are
sure. Capitalization on experience so that methods
become more efficient as they are applied is another
interesting direction. For example saving the correct
(no) synonymies inferred by L2R in a dictionary is
an illustration of capitalization. It allows learning
the syntactic variations of an application domain
in an automatic and unsupervised way.
Finally, the demand for methods that ensure
good results and which can be applied on new
data again and again while remaining as efficient
as ever will increase. Today there are a lot of dif-
ficulties to estimate in advance the precision of a
system when it is applied to a new set of data. As
a consequence two research objectives should be
favored in a near future. A first one is to elaborate
generic methods that guarantee sure results (a
logical method of the kind of L2R for example).
Such methods are very interesting but they can
not be used in any case especially when the data
is “dirty” or the global schema is an integrated
schema resulting from an automatic matching pro-
cess. Furthermore they must be complemented by
others in order to obtain a better recall. A second
objective is to propose methods, which reconcile
data on the basis of similarity scores (not neces-
sarily 100%) designed together with mechanisms
capable to reason on the uncertain reconciliation
decisions. That means that uncertainty manage-
ment will become a major challenge to be taken
up. Uncertainty gathered in data warehouses while
populating them will have to be exploited by rea-
soning on tracks of reconciliation decisions.
CONCLUSION
We have presented an information integration
approach able to extract, transform and integrate
data in a data warehouse guided by an ontology.
Whatever the application domain is, the ap-
proach can be applied to XML sources that are
valid documents and that have to be integrated
in a RDF data warehouse with data described in
terms of a RDFS ontology. Mappings between the
external sources and the ontology are represented
in a declarative way. Their definition is made
apart from the extraction process. Extraction
operates on any XML document given mappings
Search WWH ::




Custom Search