Database Reference
In-Depth Information
of aggregated information, used primarily for reporting and analysis. Although the
various operational data systems are dealing with real-time transactions, such as pur-
chases and customer updates, a data warehousing solution is designed as a way to over-
come the problems created by these data silos.
Data warehouses are often described as subject oriented , because the design of the
data structures used is centered around asking questions about a particular subject or
concept. In other words, an operational database is designed to make sure that a cus-
tomer's contact information is correct and up to date. A data warehouse is concerned
with questions such as, “Over a certain period of time, which customers have made
in-store purchases and online purchases?”
Data warehouse tools are not meant to handle day-to-day updates of customer data.
An operational database might be consistently updated at any time. Meanwhile, a data
warehouse might be used to append a daily snapshot of data for analysis. As an exam-
ple, think of an airline reservation system. As customers make changes to their plans,
the reservation system is updated in real time accordingly. However, at the end of
each business day, a process might be defined that appends a snapshot of this data into
a warehouse application. Other data, perhaps from the airline's separate loyalty card
database, might be appended into the data warehouse as well. The next day, queries
can be run across this aggregated data to inform the airline about trends in frequent-
f lyer points over time.
Another basic requirement of a data warehouse is obviously to provide the ability to
build reports and ask questions about data. In order to pull in data from many different
sources and to quickly run queries, the data structure used in modern large-scale data
warehousing software requires careful modeling. One of the simplest ways to represent
data in a warehouse system is by using a star schema , so named because the diagram
of this schema resembles the points of a star. A record of data in a star schema is some-
what denormalized , meaning that data is replicated in various tables. In its simplest
form, a star schema represents data by using two types of tables: a fact table and one or
more dimension tables. This schema is best illustrated with an example. Let's consider
a single record of a customer transaction. In a star schema, this type of data might be
represented by a single fact table containing records of the customer ID, a product
ID, a cost, and a time of purchase. Within an organization, there's more to the data
than just this basic information. Each customer will also have associated contact infor-
mation, including an address, a phone number, and an email address. The product sold
has other metadata associated with it, as well, including its supplier information. These
additional types of data would be stored in dimension tables , with a key correspond-
ing to the fact table information. The additional information found in the dimension
tables completes the entire picture of data that encompasses the subject of a customer
purchase. The use of star schemas, and more complex data-modeling structures like
them, is meant to make querying as efficient as possible without the excessive joins
that might be found in a standard relational database.
The design and implementation of this type of schema, as well as more compli-
cated versions of this concept, goes well beyond the scope of this topic. In practice,
 
Search WWH ::




Custom Search