Database Reference
In-Depth Information
Cabinet, and Memecached can utilize attached storage devices to store data in RAM
or disks. Other storage systems store data at RAM and provide disc backup, or rely
on copies and copy recovery to avoid the need for backup.
4.3.1.2
Column-Oriented Databases
The column-oriented databases store and process data according to columns other
than rows. Columns and rows are segmented in multiple nodes to realize expand-
ability. The column-oriented databases are mainly inspired by Google's BigTable.
In this section, we first discuss BigTable and then introduce several derivative tools.
BigTable
BigTable is a distributed, structured data storage system, which is designed to
process the large-scale (PB class) data among thousands commercial servers [ 3 ].
The basic data structure of BigTable is a multi-dimension sequenced mapping with
sparse, distributed, and persistent storage. Indexes of mapping are key words of
rows, key words of columns, and timestamps, and every value in mapping is an
unanalyzed byte array. The key words of rows in BigTable are 64KB character
strings, in which the rows are stored according to the lexicographical order and
are continually segmented into Tablets, i.e. units of distribution and load balance.
This way, read a short row of data can be highly effective, since it only involves
communication with a small portion of machines. The columns are grouped
according to the prefixes of key words, which are called column families. These
column families are the basic units for access control. The timestamps are 64-
bit integers to distinguish different editions of cell values. Clients may flexibly
determine the quantity of cell editions to be stored. These editions are sequenced
in the descending order of timestamps, so the latest edition will always be read.
The BigTable API features the creation and deletion of Tablets and column
families as well as modification of metadata of clusters, tables, and column families,
and access control rights. Client applications may write or delete values of BigTable,
look up values from columns, or browse sub-datasets in a table. BigTable also
supports some other characteristics, such as transaction processing in a single row.
Users may utilize such features to conduct more complex processing on data.
BigTable is based on many fundamental components of Google, including
GFS [ 5 ], cluster management system, SSTable file format, and Chubby [ 11 ]. GFS is
use to store data and log files. The cluster management system is responsible for task
scheduling, management of shared resources in machines, processing of machine
failures, and monitoring of machine statuses. SSTable file format is used to store
BigTable data internally. SStable provides mapping between persistent, sequenced,
and unchangeable key words and values, with key words and values of any byte
strings. BigTable utilizes Chubby for the following server tasks: (1) ensure there
is at most one active Master copy at any time; (2) store the bootstrap location of
Search WWH ::




Custom Search