Understanding Query Execution - Google BigQuery Analytics

Database Reference

In-Depth Information

can make more sense because BigQuery doesn't currently support updates

or deletes. Finally, if you are doing a huge numbers of queries, you'll likely

run up against the BigQuery quota policy.

There are drawbacks to a relational database, too. If you are running ad-hoc

queries, and don't have a lot of time and energy to spend optimizing your

data layout, you may end up spending a lot of time waiting for your queries

to run. If you have large tables, the database software you're using may not

scale well. This last point is why a number of new “no-SQL” databases exist

and was the rationale for creating BigQuery in the first place.

MapReduce

MapReduce, as a Big Data processing architecture, has only been around for

approximately 10 years. It was developed at Google by Jeff Dean and Sanjay

Ghemawat as a mechanism to perform computations over large data sets

by applying principles from functional programming (the Map and Reduce

operations). The primary idea is that you decompose your computation into

two phases: Map, which transforms the data, and Reduce, which combines

the results.

After Google published the principles of MapReduce in a research paper,

Doug Cutting picked up the concept and began building an open source

version that he called Hadoop, after his son's toy elephant. In the past couple

of years, Hadoop has gained rapid popularity because of companies such as

Yahoo! that productized it and startups such as Cloudera and MapR that

pushed the boundaries of what it could do. Most people who use MapReduce

now use it via Hadoop.

Comparisons of BigQuery to MapReduce are included because many people

who consider using BigQuery also wonder why they shouldn't just use

something such as Hive on top of Hadoop. This section shows the

architecture of MapReduce and why it is generally more suited toward batch

workloads than interactive exploration of your data.

MapReduce Design

Despite the name, MapReduce isn't just Map and Reduce—it is actually

Map, Combine, Shuffle, and Reduce. And actually, what you usually think

of as MapReduce encompasses not just the actual computation, but also a

number of other technologies that enable it such as a distributed filesystem.

Search WWH ::

Custom Search

Home