Database Reference
In-Depth Information
Chapter 13
New Data Warehouse Technologies
Big data refers to large collections of data that may be unstructured or
may grow so large and at such a high pace that it is dicult to manage
them with standard database systems or analysis tools. Examples of big data
include web logs, radio-frequency identification tags, sensor networks, and
social networks, among other ones. It has been reported as of the time of
writing this topic that 7 and 10 terabytes of data are added and processed,
respectively, by Twitter and Facebook every day. Approximately 80% of these
data are unstructured, and 90% of them have been created in the last 2
years. Management and analysis of these massive amounts of data demand
new solutions that go beyond the traditional processes or software tools. All
of these have great implications on the way data warehousing practice is
going to be performed in the future. For instance, big data analytics requires
in many cases the data latency (the time elapsed between the moment
some data are collected and the action based on such data is taken) to
be dramatically reduced. Thus, near real-time data management techniques
must be developed. Also, external data sources like the semantic web may
need to be queried.
Technology has started to give answers to the challenges introduced by
big data: massive parallel processing, column-store databas systems, and in-
memory database systems (IMDBSs) are some of these answers that we will
discuss in this chapter. In Sect. 13.1 , we present the MapReduce framework
and its most popular implementation, Apache Hadoop. In Sect. 13.2 ,we
study Hive and Pig Latin, two high-level languages that make it easier to
write the MapReduce code. We then study two architectures increasingly
used in data warehousing: column-store database systems (Sect. 13.3 )and
IMDBSs (Sect. 13.4 ). To give a complete picture, in Sect. 13.5 we briefly
describe several database systems that exploit the architectures above. We
conclude the chapter with a study of real-time data warehousing (Sect. 13.6 )
and the extraction, loading, and transformation paradigm (ELT), which
is challenging the traditional ETL process (Sect. 13.7 ). These new data
Search WWH ::




Custom Search