Database Reference
In-Depth Information
These principles, along with a cost-saving necessity, inspired new
computation architectures. Over a short period of time, Google produced
three technologies that inspired the Big Data revolution:
Google File System (GFS) : A distributed, cluster-based filesystem.
GFS assumes that any disk can fail, so data is stored in multiple
locations, which means that data is still available even when a disk that
it was stored on crashes.
MapReduce : A computing paradigm that divides problems into easily
parallelizable pieces and orchestrates running them across a cluster of
machines.
Bigtable : A forerunner of the NoSQL database, Bigtable enables
structured storage to scale out to multiple servers. Bigtable is also
replicated, so failure of any particular tablet server doesn't cause data
loss.
What's more, Google published papers on these technologies, which enabled
others to emulate them outside of Google. Doug Cutting and other open
source contributors integrated the concepts into a tool called Hadoop.
Although Hadoop is considered to be primarily a MapReduce
implementation, it also incorporates GFS and BigTable clones, which are
called HDFS and HBase, respectively.
Armed with these three technologies, Google replaced nearly all the
off-the-shelf software usually used to run a business. It didn't need (with a
couple of exceptions) a traditional SQL database; it didn't need an e-mail
server because its Gmail service was built on top of these technologies.
Big Data Stack 2.0 (and Beyond)
The three technologies—GFS, MapReduce, and Bigtable—made it possible
for Google to scale out its infrastructure. However, they didn't make it easy.
Over the next few years, a number of problems emerged:
• MapReduce is hard. It can be difficult to set up and difficult to
decompose your problem into Map and Reduce phases. If you need
multiple MapReduce rounds (which is common for many real-world
problems), you face the issue of how to deal with state in between
phases and how to deal with partial failures without having to restart
the whole thing.
Search WWH ::




Custom Search