Database Reference
In-Depth Information
to disk. Unlike MapReduce, Dremel does its best to not use a disk at all. For aggregate
queries that require a full table scan of data, BigQuery does all of its work using avail-
able system memory, which makes for speedy query performance. For more informa-
tion on how developers are using in-memory data systems such as Redis and MemSQL
to increase data throughput, see Chapter 3.
The use cases for batch processing with MapReduce frameworks and the iterative
querying tasks solved by BigQuery are complementary. In fact, a system that uses even
more specialized software systems such as a key-value store for high-volume data col-
lection (see Chapter 3), a MapReduce framework for normalization or processing, or
an analytical database for the ability to question the collected data quickly may be even
more common for a commercial application.
BigQuery: Data Analytics as a Service
Dremel is an internal Google tool, so how can it be used as a practical data solution for
everyone else? Similar to the way in which Amazon released their key-value datastore
DynamoDB, Google has provided the technology used by Dremel to outside develop-
ers via an application programming interface (API). This service is known as Google
BigQuery.
Commonly used applications such as Web mail, social networking, and music are
becoming available as cloud services to be consumed through an interface such as a
Web browser or a mobile device. Instead of relying on desktop hardware for process-
ing power, the application space is being re-envisioned through standards-based pro-
tocols. Although there are many pitfalls to overcome, the advantages to this model
include device independence, the potential for lowering costs, and new opportunities
for social collaboration.
For data scientists, off loading hardware responsibilities to service providers is
becoming more common. The move to hosted services is not just a result of taking
advantage of the latent processing capabilities of cloud environments. Using a cloud
service also allows analysts and software engineers to more easily collaborate on tasks
without having to manage hardware. As we've seen in other chapters, when data sizes
are large, using cloud-based services can sometimes be the only way to economically
solve a data challenge.
As we've seen in example after example in this topic, a common theme of work-
ing with data at scale has been innovative rethinking about technology. Nonrelational
databases, such as key-value datastores, were created to address the difficulty of scaling
the relational model to Web-scale data.
BigQuery is not a database in the traditional sense, and it exhibits characteristics
different from those of a traditional relational database. Although it is possible for
BigQuery to store data, it is an append-only system. Individual records cannot be
updated; it's only possible to append data to existing data tables. Also, unlike a stan-
dard relational database, the system doesn't support the complete range of standard
 
 
Search WWH ::




Custom Search