Database Reference
In-Depth Information
trendS And chAllengeS to
dAtA WArehouSe ArchItecture
from operational systems. Many industry prac-
tices indicate that data quality should in fact be
controlled on the whole enterprise lifecycle rather
than only at the data warehouse.
Performance management is normally a com-
pulsory practice of any data warehouses. The study
of database theories and ETL efficiency (Thomsen,
Pedersen, & Lehner, 2008; Luo, Naughton, Ell-
mann, & Waltzk, 2006) provides the foundation
of many performance tuning functionalities and
practices in different vendors of database systems
and ETL tools. Many database vendors are cur-
rently experiencing challenges of managing over
hundreds of terabytes of data. The research work
on parallel processing and data warehouse man-
agement (Datta, VanderMeer, & Ramamritham,
2002; Furtado, 2004) paves the directions that
industry is heading for. The study of data federa-
tion (Haas, Lin, & Roth, 2002) is also a practice
of managing large volume data warehouses.
OLAP cube is one of the most widely used
data access tools in data warehouses. The research
community has a quite broad study of data cube
since the paper by Gray et al., 1996. Many research
results in the area of data mining, such as cluster-
ing, nearest neighbor search, neutral network have
been putting into practices by different BI vendors.
In addition, the study of in-memory database and
OLAP technologies (Lehman & Carey, 1986;
Ross, 2004) has been implemented into different
desktop-based analytical tools.
To summarize, different practices of data ware-
house architecture come from both the engineering
experiences and the results and contributions of the
research community. These practices can change
over the time when different trends and challenges
occur to the data warehouse architecture. We
proceed to describe several emerging trends in
the data warehousing and software architecture
world and discuss how these trends influence the
practices of data warehouse architecture.
The past decade of data warehousing practices
have let different enterprises into the era of in-
tegrating and consolidating different source of
information into centralized data warehouses.
However, the rapidly-changing business require-
ments pose further challenges to the effectiveness
and efficiency of having a “hub-and-spoke” data
warehouse architecture. In parallel to this, the IT
technology trend has entered the web 2.0 era and
concepts such as service-oriented-architecture,
real-time data warehousing, and master data
management are widely spread. We proceed to
elaborate on the major trends that are influencing
the data warehouse architecture.
Service oriented data
Warehouse Architecture
Service-oriented architecture (SOA) is a collection
of services which communicate with each other.
Such communications, varying from simple data
delivery to coordination of multiple services,
form the ground of orchestration of encapsulated
enterprise services. As the basic element of SOA,
a service can be understood as a well-defined,
loosely coupled, interoperable, and composable
software component or software agent. A ser-
vice must have well-defined interfaces based on
standard protocols as well as quality-of-service
attributes or policies on how the interfaces can
be used.
Many existing data warehouses were designed
with assumptions that the workflows around them
are simple and pre-defined. For example, ETL
programs are often executed as one big batch
window and all the transformation, conforma-
tion and data cleansing functionalities are tightly
bound with each other in the program. In the SOA
concept, traditional data warehouse architecture
needs to be broken down into different services
on the enterprise service bus. The management
Search WWH ::




Custom Search