Database Reference
In-Depth Information
the data is in the shape that we want, we may want to merge it with other datasets to
provide additional context for our questions. Finally, after these steps are complete, we
are ready to start asking questions about our large datasets by running queries against
them. The answers to our questions can sometimes create more questions, so we need
to come up with a solution for iteratively querying our data. In the world of traditional
relational databases, large, aggregate queries over huge tables are not particularly fast
without additional indexing or the use of additional techniques for shaping the data
into a different form. Nonrelational databases are also not designed for this purpose.
What we need is another type of technology optimized for the purpose of asking
questions and retrieving the result quickly.
When working with business data in large organizations, a common pattern is to
use one type of system for keeping track of day-to-day data and other types of systems
optimized to analyze the data. For customer-facing applications, which handle data
such as financial transactions, Web content, or personal information, relational data-
bases are often used. The main goals of these systems are to provide high consistency
for data. If one of many millions of customers updates their address via a Web form, it
is important that this transaction is recorded accurately and immediately. Data integ-
rity is obviously very important in this context. Because large organizations depend on
multiple databases for customer data updates, many invest in tools to provide transac-
tion consistency across systems.
The workhorse databases that bravely face the outside world are often described as
“online transactional processing” or OLTP systems. There is no canonical software
design that typifies an OLTP system, as the “transaction” might mean any number of
things to different people. Sometimes this term refers to the technical database term
“transaction,” or it can mean that the database is handling inserts that describe the
customer transactions of some type. In any case, a good way to think about OLTP
systems is that they are designed to be good for handling data that might constantly
be updated as operational data. In practice, an OLTP system is usually some type of
relational database, which is great for ensuring consistency and enforcing that data fits
a particular schema. As we mentioned earlier, the relational model isn't always the best
design for those applications that require running aggregate database queries quickly
(see Chapter 3, “Building a NoSQL-Based Web App to Collect Crowd-Sourced
Data,” for more information about the role of relational and nonrelational databases).
Our crucial frontline databases are optimized to make sure our customers' data is in
good shape, but they are not designed to make the analyst's job easier. Analysts need
to be able to ask questions about their data and use the resulting information to help
shape the strategy of the organization. Speed is a major concern because waiting min-
utes or hours for a query result to complete can be the difference between being able
to make a quick decision and missing a deadline.
The need to perform fast queries on data stored in relational databases is sometimes
solved using techniques that use an online analytical processing, or OLAP, system .
While the term OLAP looks confusingly similar to its cousin OLTP, it means some-
thing very different. OLAP systems use techniques for shaping data originating from
 
Search WWH ::




Custom Search