Information Technology Reference
In-Depth Information
A Multi-driven Approach to Improve Data Analytics
for Multi-value Dimensions
Gabriel Pestana 1 , Pedro Catelas 2 , and Isabel Rosa 3
1 Universidade Europeia, Lisbon, Portugal
gabriel.pestana@europeia.pt
2 INOV INESC Inovação - Instituto de Novas Tecnologias, Lisbon, Portugal
pedro.catelas@inov.pt
3 Instituto da Construção e do Imobiliário, I.P. (InCI), Lisbon, Portugal
Isabel.rosa@inci.pt
Abstract. The Data Warehouse is a data storage medium with the purpose to
produce accurate and useful information to support business stakeholders to
conduct data analysis that helps with performing decision making processes and
improving information resources. The data warehouse provides a single and de-
tailed view of the organization, and it is intended to be exploited by means of
OLAP (On-line Analytical Processing) tools. These tools facilitate information
analysis and navigation through the business data based on the multidimension-
al paradigm. A crucial decision for designing multidimensional models con-
cerns the grain of facts, determined by fact-dimension relationships. This
means, that the accuracy of the information can depend on how the data model
is structured to support multi-value dimensions and avoid double-counting's.
The paper presents a technique used to overcome these constraints enabling de-
signers to abstract complexity at a conceptual level without taking into account
of more complex schema structures (like bridge table) to deal with non-strict
fact-dimension relationships at different granularities. The technique is demon-
strated using the Pentaho tool and lessons learned from our case study, an
information system to monitor the execution of public works contracts.
Keywords: Multidimensional Schema Design, Requirements Analysis, Multi-
Value Dimensions.
1
Introduction
Nowadays, the standard approach for a Data Warehouse (DW) implementation is to
rely on multidimensional models, so that it can be exploited by OLAP tools. One fact
table and several dimensions of analysis form the dominant design pattern known as
Star- and Snowflake-schema [1]. Both schemas are designed with one-to-many rela-
tionships in mind. However they do not function well when faced with a many-to-
many relationship between facts and dimensions (i.e., Multi-Valued Dimension).
Although great achievements in research have been achieved in this field, there is still
a lack of a comprehensive understanding of data modelling methodologies in data
warehousing [2].
Search WWH ::




Custom Search