Database Reference
In-Depth Information
Durability and Availability
This section describes the current implementation of how your data is
stored in BigQuery. While BigQuery does not have official data reliability
guarantees, every table in BigQuery is replicated several times. Data in a
CFS cell is replicated in three ways to provide durability and reduce tail
latency. BigQuery also asynchronously replicates all tables to multiple
geographically distributed datacenters. If there is a power failure in one
datacenter, it should not affect your ability to get to your data.
When you initially load data into BigQuery, it is added only to a single
datacenter, so there is a small window where the data is singly homed
(even though it is still replicated three ways within the datacenter). When
the load completes, a synchronization process copies that data to other
remote datacenters. When you query your table, you should always get the
most recent version of your data, even if it has not finished replicating
everywhere.
When you have distributed a filesystem with a large number of disks (at
least 10s of thousands), replication is essential because some of those disks
fail every day. Disks can fail in a lot of different ways: They can develop
errors due to cosmic rays, their motors can burn out slowly, the machine
that hosts them can have a bad power supply, or the magnetic disks can lose
their sensitivity. Having multiple choices of where to read data from comes
in handy when you read from a lot of disks because the odds that at least one
is healthy are high.
Durability is a measure of the persistence of data. Data that is replicated
nine ways and has adequate failure replacement policies (that is, when a
disk fails it isn't just removed, but the copy of the data is replaced on a
fresh disk) has an expected lifetime on the order of 1 million years (see
http://cseweb.ucsd.edu/users/pasquale/Papers/
ipdps10.pdf ). That said, Google BigQuery does not at this time publish a
durability guarantee for its data. As with all data, if it is critical, make a
backup. BigQuery enables you to easily copy a table to back it up, or you can
export it to Google Cloud Storage for safekeeping.
Availability is a measure of your ability to get to your data when you want it.
BigQuery does publish a guarantee that the service will be available 99.9%
of the time. This guarantee includes the ability to get to your data. The most
Search WWH ::




Custom Search