Introduction - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

Most applications focus on the analysis of data produced by objects like

customers, suppliers, and so on, assuming that these objects are static ,inthe

sense that their position in space and time is not relevant for the application

at hand. Nevertheless, many applications require the analysis of data about

moving objects , that is, objects that change their position in space and

time. The possibilities and interest of mobility data analysis have expanded

dramatically with the availability of positioning devices. Trac data, for

example, can be captured as a collection of sequences of positioning signals

transmitted by the cars' GPS along their itineraries. Although such sequences

can be very long, they are often processed by being divided in segments of

movement called trajectories , which are the unit of interest in the analysis

of movement data. Extending data warehouses to cope with trajectory data

leads to the notion of trajectory data warehouses . These are studied in

Chap. 12 .

1.3 New Domains and Challenges

Nowadays, the availability of enormous amounts of data is calling for a shift in

the way data warehouse and business intelligence practices have been carried

out since the 1990s. It is becoming clear that for certain kinds of business

intelligence applications, the traditional approach, where day-to-day business

data produced in an organization are collected in a huge common repository

for data analysis, needs to be revised, to account for eciently handling large-

scale data. In many emerging domains where business intelligence practices

are gaining acceptance, such as social networks or geospatial data analytics,

massive-scale data sources are becoming common, posing new challenges

to the data warehouse research community. In addition, new database

architectures are gaining momentum. Parallelism is becoming a must for large

data warehouse processing. Column-store database systems (like MonetDB

and Vertica) and in-memory database systems (like SAP HANA) are strong

candidates for data warehouse architectures since they deliver much better

performance than classic row-oriented databases for fact tables with a large

number of attributes. The MapReduce programming model is also becoming

increasingly popular, challenging traditional parallel database management

systems architectures. Even though at the time of writing this topic it is

still not clear if this approach can be applied to all kinds of data warehouse

and business intelligence applications, many large data warehouses have been

built based on this model. As an example, the Facebook data warehouse

was built using Hadoop (an open-source implementation of MapReduce).

Chapter 13 discusses these new data warehousing challenges.

We already commented that the typical method of loading data into a

data warehouse is through an ETL process. This process pulls data from

source systems periodically (e.g., daily, weekly, or monthly), obtaining a

Search WWH ::

Custom Search

Home