Databases Reference
In-Depth Information
The data in source systems is not “clean” or consistent across systems
Data input to transactional systems, if not carefully controlled, is likely to contain
errors and duplication. Often, a key portion of the data warehouse loading process
involves elimination of these errors through data transformation. Since multiple
source systems might differ in data definitions, data transformations during the
ETL (extraction, transformation, and load) process can be used to modify data into
a single common definition as well as improve its quality.
The design required for an efficient data warehouse differs from the standard normal‐
ized design for a relational database
Queries are typically submitted against a fact table , which may contain summarized
data. The schema design often used, a star schema , lets you access facts quite flexibly
along key dimensions or “lookup” values. (The star schema is described in more
detail later in this chapter.) For instance, a business user may want to compare the
total amount of sales, which comes from a fact table, by region, store in the region,
and items, all of which can be considered key dimensions. Today's data warehouses
often feature a hybrid schema that is a combination of the star schema common in
previous-generation data marts with third normal form schema for detailed data
that is common in OLTP systems and enterprise data warehouses.
The data warehouse often serves as a target for meaningful data found on Big Data
platforms that optimally solve semi-structured data problems
Big Data can be described as semi-structured data containing data descriptors, data
values, and other miscellaneous data bits produced by sensors, social media, and
web-based data feeds. Given the amount of irrelevant data present, the processing
goal on a Big Data platform is to map the data and reduce it to data of value (hence
“MapReduce” callouts in programs written using languages such as Java and Python
that refine this data). This subset of Big Data is usually fed to a data warehouse
where it has value across the business and might be analyzed side by side with
structured data.
The Evolution of Data Warehousing and Business Intelligence
Gathering business intelligence from data warehouses is not a new idea. The use of
corporate data for strategic decision-making beyond simple tracking and day-to-day
operations has been going on for almost as long as computing itself.
Quite early, builders and users of operational systems recognized potential business
benefits of analyzing the data in complementary systems. In fact, much of the early
growth in personal computers was tied to the use of spreadsheets that performed anal‐
yses using data downloaded from the operational systems. Business executives began
to direct IT efforts toward building solutions to better understand the business using
such data, leading to new business strategies. Today, solutions are commonly provided
in business areas such as customer relationship management, sales and marketing
Search WWH ::




Custom Search