Database Reference
In-Depth Information
can make more sense because BigQuery doesn't currently support updates
or deletes. Finally, if you are doing a huge numbers of queries, you'll likely
run up against the BigQuery quota policy.
There are drawbacks to a relational database, too. If you are running ad-hoc
queries, and don't have a lot of time and energy to spend optimizing your
data layout, you may end up spending a lot of time waiting for your queries
to run. If you have large tables, the database software you're using may not
scale well. This last point is why a number of new “no-SQL” databases exist
and was the rationale for creating BigQuery in the first place.
MapReduce
MapReduce, as a Big Data processing architecture, has only been around for
approximately 10 years. It was developed at Google by Jeff Dean and Sanjay
Ghemawat as a mechanism to perform computations over large data sets
by applying principles from functional programming (the Map and Reduce
operations). The primary idea is that you decompose your computation into
two phases: Map, which transforms the data, and Reduce, which combines
the results.
After Google published the principles of MapReduce in a research paper,
Doug Cutting picked up the concept and began building an open source
version that he called Hadoop, after his son's toy elephant. In the past couple
of years, Hadoop has gained rapid popularity because of companies such as
Yahoo! that productized it and startups such as Cloudera and MapR that
pushed the boundaries of what it could do. Most people who use MapReduce
now use it via Hadoop.
Comparisons of BigQuery to MapReduce are included because many people
who consider using BigQuery also wonder why they shouldn't just use
something such as Hive on top of Hadoop. This section shows the
architecture of MapReduce and why it is generally more suited toward batch
workloads than interactive exploration of your data.
MapReduce Design
Despite the name, MapReduce isn't just Map and Reduce—it is actually
Map, Combine, Shuffle, and Reduce. And actually, what you usually think
of as MapReduce encompasses not just the actual computation, but also a
number of other technologies that enable it such as a distributed filesystem.