Building a Data Dashboard with Google BigQuery - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

to disk. Unlike MapReduce, Dremel does its best to not use a disk at all. For aggregate

queries that require a full table scan of data, BigQuery does all of its work using avail-

able system memory, which makes for speedy query performance. For more informa-

tion on how developers are using in-memory data systems such as Redis and MemSQL

to increase data throughput, see Chapter 3.

The use cases for batch processing with MapReduce frameworks and the iterative

querying tasks solved by BigQuery are complementary. In fact, a system that uses even

more specialized software systems such as a key-value store for high-volume data col-

lection (see Chapter 3), a MapReduce framework for normalization or processing, or

an analytical database for the ability to question the collected data quickly may be even

more common for a commercial application.

Dremel is an internal Google tool, so how can it be used as a practical data solution for

everyone else? Similar to the way in which Amazon released their key-value datastore

DynamoDB, Google has provided the technology used by Dremel to outside develop-

ers via an application programming interface (API). This service is known as Google

BigQuery.

Commonly used applications such as Web mail, social networking, and music are

becoming available as cloud services to be consumed through an interface such as a

Web browser or a mobile device. Instead of relying on desktop hardware for process-

ing power, the application space is being re-envisioned through standards-based pro-

tocols. Although there are many pitfalls to overcome, the advantages to this model

include device independence, the potential for lowering costs, and new opportunities

for social collaboration.

For data scientists, off loading hardware responsibilities to service providers is

becoming more common. The move to hosted services is not just a result of taking

advantage of the latent processing capabilities of cloud environments. Using a cloud

service also allows analysts and software engineers to more easily collaborate on tasks

without having to manage hardware. As we've seen in other chapters, when data sizes

are large, using cloud-based services can sometimes be the only way to economically

solve a data challenge.

As we've seen in example after example in this topic, a common theme of work-

ing with data at scale has been innovative rethinking about technology. Nonrelational

databases, such as key-value datastores, were created to address the difficulty of scaling

the relational model to Web-scale data.

BigQuery is not a database in the traditional sense, and it exhibits characteristics

different from those of a traditional relational database. Although it is possible for

BigQuery to store data, it is an append-only system. Individual records cannot be

updated; it's only possible to append data to existing data tables. Also, unlike a stan-

dard relational database, the system doesn't support the complete range of standard

Search WWH ::

Custom Search

Home