Building a Data Dashboard with Google BigQuery - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

the data is in the shape that we want, we may want to merge it with other datasets to

provide additional context for our questions. Finally, after these steps are complete, we

are ready to start asking questions about our large datasets by running queries against

them. The answers to our questions can sometimes create more questions, so we need

to come up with a solution for iteratively querying our data. In the world of traditional

relational databases, large, aggregate queries over huge tables are not particularly fast

without additional indexing or the use of additional techniques for shaping the data

into a different form. Nonrelational databases are also not designed for this purpose.

What we need is another type of technology optimized for the purpose of asking

questions and retrieving the result quickly.

When working with business data in large organizations, a common pattern is to

use one type of system for keeping track of day-to-day data and other types of systems

optimized to analyze the data. For customer-facing applications, which handle data

such as financial transactions, Web content, or personal information, relational data-

bases are often used. The main goals of these systems are to provide high consistency

for data. If one of many millions of customers updates their address via a Web form, it

is important that this transaction is recorded accurately and immediately. Data integ-

rity is obviously very important in this context. Because large organizations depend on

multiple databases for customer data updates, many invest in tools to provide transac-

tion consistency across systems.

The workhorse databases that bravely face the outside world are often described as

“online transactional processing” or OLTP systems. There is no canonical software

design that typifies an OLTP system, as the “transaction” might mean any number of

things to different people. Sometimes this term refers to the technical database term

“transaction,” or it can mean that the database is handling inserts that describe the

customer transactions of some type. In any case, a good way to think about OLTP

systems is that they are designed to be good for handling data that might constantly

be updated as operational data. In practice, an OLTP system is usually some type of

relational database, which is great for ensuring consistency and enforcing that data fits

a particular schema. As we mentioned earlier, the relational model isn't always the best

design for those applications that require running aggregate database queries quickly

(see Chapter 3, “Building a NoSQL-Based Web App to Collect Crowd-Sourced

Data,” for more information about the role of relational and nonrelational databases).

Our crucial frontline databases are optimized to make sure our customers' data is in

good shape, but they are not designed to make the analyst's job easier. Analysts need

to be able to ask questions about their data and use the resulting information to help

shape the strategy of the organization. Speed is a major concern because waiting min-

utes or hours for a query result to complete can be the difference between being able

to make a quick decision and missing a deadline.

The need to perform fast queries on data stored in relational databases is sometimes

solved using techniques that use an online analytical processing, or OLAP, system .

While the term OLAP looks confusingly similar to its cousin OLTP, it means some-

thing very different. OLAP systems use techniques for shaping data originating from

Data Just Right: Introduction to Large-Scale Data and Analytics

Search WWH ::

Custom Search

Home