Database Reference
In-Depth Information
• MapReduce can be slow. If you want to ask questions of your data, you
have to wait minutes or hours to get the answers. Moreover, you have to
write custom C++ or Java code each time you want to change the
question that you're asking.
• GFS, while improving durability of the data (since it is replicated
multiple times) can suffer from reduced availability, since the metadata
server is a single point of failure.
• Bigtable has problems in a multidatacenter environment. Most services
run in multiple locations; Bigtable replication between datacenters is
only eventually consistent (meaning that data that gets written out will
show up everywhere, but not immediately). Individual services spend a
lot of redundant effort babysitting the replication process.
• Programmers (even Google programmers) have a really difficult time
dealing with eventual consistency. This same problem occurred when
Intel engineers tried improving CPU performance by relaxing the
memory model to be eventually consistent; it caused lots of subtle bugs
because the hardware stopped working the way people's mental model
of it operated.
Over the next several years, Google built a number of additional
infrastructure components that refined the ideas from the 1.0 stack:
Colossus : A distributed filesystem that works around many of the
limitations in GFS. Unlike many of the other technologies used at
Google, Colossus' architecture hasn't been publicly disclosed in research
papers.
Megastore : A geographically replicated, consistent NoSQL-type
datastore. Megastore uses the Paxos algorithm to ensure consistent
reads and writes. This means that if you write data in one datacenter, it
is immediately available in all other datacenters.
Spanner : A globally replicated datastore that can handle data locality
constraints, like “This data is allowed to reside only in European
datacenters.” Spanner managed to solve the problem of global time
ordering in a geographically distributed system by using atomic clocks
to guarantee synchronization to within a known bound.
FlumeJava : A system that allows you to write idiomatic Java code that
runs over collections of Big Data. Flume operations get compiled and
Search WWH ::




Custom Search