X-WACoDa - Data Warehousing Design and Advanced Engineering Applications

Database Reference

In-Depth Information

through a DBMS. Two types of DBMSs currently

support XML data storage, management, and

query processing and optimization: relational,

XML-enabled systems that map XML data into

relational tables, and XML-native DBMSs.

In our approach, we focus on XML-native

systems. They indeed consider XML documents

as the fundamental unit of storage. They define

a specific XML storage schema for XML docu-

ments, and store and retrieve them according to

this model, which includes elements, attributes,

their contents and order. Moreover, many XML

databases provide a logical model for grouping

XML documents, called collections or libraries.

XML-native DBMSs also implement XML query

engines supporting XPath and XQuery. In addition,

some XML-native DBMSs support the XML:DB

Application Programming Interface (API) that

features a form of implementation independent

access to XML data.

Query engine performance is the primary cri-

terion when selecting an XML native DBMS. A

good candidate system must be capable to perform

complex queries in reasonable response times and

to store a large volumes of data. This may cur-

rently be seen as a weak point in our choice, since

XML-native DBMSs are not mature yet, but we

are also working in parallel on optimizing their

performances (Mahboubi et al. , 2006; Mahboubi

et al. , 2008; Mahboubi et al. , 2008a).

struct comparable to the SQL group by clause,

in order to allow common business analysis (i.e.,

OLAP-like) queries (Mahboubi et al. , 2006).

The second analysis subcomponent, Mining-

Cubes, is a Web-based application that includes

a set of on-line analysis and mining components

(BenMessaoud et al. , 2006a). Analysis com-

ponents aim at loading data and performing

multidimensional explorations (through two or

three-dimension views). Data mining compo-

nents implement methods such as agglomerative

hierarchical clustering or frequent itemset min-

ing. From a user's point of view, MiningCubes

integrates these components in a transparent way.

For instance, a factorial approach can be used to

represent and reorganize relevant OLAP facts,

association rules may be mined from an OLAP

cube, or clustering may be exploited to aggregate

non-additive dimension members in roll-up/drill-

down operations.

Case Study

Let us now apply the X-WACoDa approach

onto a real-world application domain and con-

sider complex data from the Digital Database

for ScreeningMammography 4 (DDSM). DDSM

gathers 2604 medical history cases of anonymous

patients. Each case contains an ASCII text file

representing general information about a patient

and four LJPEG radiography image files. These

data are issued from multiple sources and encoded

through different file types, and may thus be

considered complex.

The first step in our approach is to transform

DDSM data in XML format to guarantee an

homogeneous representation and to allow data

integration into an XML data warehouse. Such

XML documents individually describe one whole

medical case by gathering study information

(date, examination...), patient information, and

radiography image descriptors (file name, url,

scanner resolution...).All documents bear the same

structure and are valid against an XML Schema.

Analysis Component

X-WACoDa's analysis component is constituted

of two subcomponents. First, an ad hoc reporting

application helps users create specific and custom-

ized decision support queries. This application is

based on XML:DB and allows database connec-

tion, sending and saving XML query results in

XML format. Analytical queries are expressed in

XQuery. We selected XQuery because it allows

performing complex queries over multiple XML

documents. In addition, we extended XQuery's

FLWOR clauses with an explicit grouping con-

Data Warehousing Design and Advanced Engineering Applications

Search WWH ::

Custom Search

Home