Database Reference
In-Depth Information
Row Id
Column Id
Timestamp
Column Value
com.cnn.www
anchor:cnnsi.com
t9
CNN
com.cnn.www
anchor:my.look.ca
t8
CNN.com
Row Id
Column Id
Timestamp
Column Value
com.cnn.www
contents:
t6
<html>…
com.cnn.www
contents:
t5
<html>…
com.cnn.www
contents:
t3
<html>…
Fig. 3.2
Sample BigTable structure
range from throughput-oriented batch-processing jobs to latency-sensitive serving
of data to end users. The Bigtable clusters used by these products span a wide range
of configurations, from a handful to thousands of servers, and store up to several
hundred terabytes of data.
Bigtable does not support a full relational data model. However, it provides
clients with a simple data model that supports dynamic control over data layout and
format. In particular, a Bigtable is a sparse, distributed, persistent multidimensional
sorted map. The map is indexed by a row key, column key, and a timestamp. Each
value in the map is an uninterpreted array of bytes. Thus, clients usually need to
serialize various forms of structured and semi-structured data into these strings.
A concrete example that reflects some of the main design decisions of Bigtable is
the scenario of storing a copy of a large collection of web pages into a single table.
Figure 3.2 illustrates an example of this table where URLs areusedasrowkeys
and various aspects of web pages as column names. The contents of the web pages
are stored in a single column which stores multiple versions of the page under the
timestamps when they were fetched.
The row keys in a table are arbitrary strings where every read or write of data
under a single row key is atomic. Bigtable maintains the data in lexicographic order
by row key where the row range for a table is dynamically partitioned. Each row
range is called a tablet which represents the unit of distribution and load balancing.
Thus, reads of short row ranges are efficient and typically require communication
with only a small number of machines. BigTables can have an unbounded number of
columns which are grouped into sets called column families . These column families
represent the basic unit of access control. Each cell in a Bigtable can contain
multiple versions of the same data which are indexed by their timestamps. Each
client can flexibly decide the number of n versions of a cell that need to be kept.
These versions are stored in decreasing timestamp order so that the most recent
versions can be always read first.
Search WWH ::




Custom Search