Database Reference
In-Depth Information
through a DBMS. Two types of DBMSs currently
support XML data storage, management, and
query processing and optimization: relational,
XML-enabled systems that map XML data into
relational tables, and XML-native DBMSs.
In our approach, we focus on XML-native
systems. They indeed consider XML documents
as the fundamental unit of storage. They define
a specific XML storage schema for XML docu-
ments, and store and retrieve them according to
this model, which includes elements, attributes,
their contents and order. Moreover, many XML
databases provide a logical model for grouping
XML documents, called collections or libraries.
XML-native DBMSs also implement XML query
engines supporting XPath and XQuery. In addition,
some XML-native DBMSs support the XML:DB
Application Programming Interface (API) that
features a form of implementation independent
access to XML data.
Query engine performance is the primary cri-
terion when selecting an XML native DBMS. A
good candidate system must be capable to perform
complex queries in reasonable response times and
to store a large volumes of data. This may cur-
rently be seen as a weak point in our choice, since
XML-native DBMSs are not mature yet, but we
are also working in parallel on optimizing their
performances (Mahboubi et al. , 2006; Mahboubi
et al. , 2008; Mahboubi et al. , 2008a).
struct comparable to the SQL group by clause,
in order to allow common business analysis (i.e.,
OLAP-like) queries (Mahboubi et al. , 2006).
The second analysis subcomponent, Mining-
Cubes, is a Web-based application that includes
a set of on-line analysis and mining components
(BenMessaoud et al. , 2006a). Analysis com-
ponents aim at loading data and performing
multidimensional explorations (through two or
three-dimension views). Data mining compo-
nents implement methods such as agglomerative
hierarchical clustering or frequent itemset min-
ing. From a user's point of view, MiningCubes
integrates these components in a transparent way.
For instance, a factorial approach can be used to
represent and reorganize relevant OLAP facts,
association rules may be mined from an OLAP
cube, or clustering may be exploited to aggregate
non-additive dimension members in roll-up/drill-
down operations.
Case Study
Let us now apply the X-WACoDa approach
onto a real-world application domain and con-
sider complex data from the Digital Database
for ScreeningMammography 4 (DDSM). DDSM
gathers 2604 medical history cases of anonymous
patients. Each case contains an ASCII text file
representing general information about a patient
and four LJPEG radiography image files. These
data are issued from multiple sources and encoded
through different file types, and may thus be
considered complex.
The first step in our approach is to transform
DDSM data in XML format to guarantee an
homogeneous representation and to allow data
integration into an XML data warehouse. Such
XML documents individually describe one whole
medical case by gathering study information
(date, examination...), patient information, and
radiography image descriptors (file name, url,
scanner resolution...).All documents bear the same
structure and are valid against an XML Schema.
Analysis Component
X-WACoDa's analysis component is constituted
of two subcomponents. First, an ad hoc reporting
application helps users create specific and custom-
ized decision support queries. This application is
based on XML:DB and allows database connec-
tion, sending and saving XML query results in
XML format. Analytical queries are expressed in
XQuery. We selected XQuery because it allows
performing complex queries over multiple XML
documents. In addition, we extended XQuery's
FLWOR clauses with an explicit grouping con-
Search WWH ::




Custom Search