Big Data Storage - Big Data: Related Technologies, Challenges and Future Prospects

Database Reference

In-Depth Information

Cabinet, and Memecached can utilize attached storage devices to store data in RAM

or disks. Other storage systems store data at RAM and provide disc backup, or rely

on copies and copy recovery to avoid the need for backup.

4.3.1.2

Column-Oriented Databases

The column-oriented databases store and process data according to columns other

than rows. Columns and rows are segmented in multiple nodes to realize expand-

ability. The column-oriented databases are mainly inspired by Google's BigTable.

In this section, we first discuss BigTable and then introduce several derivative tools.

BigTable

BigTable is a distributed, structured data storage system, which is designed to

process the large-scale (PB class) data among thousands commercial servers [ 3 ].

The basic data structure of BigTable is a multi-dimension sequenced mapping with

sparse, distributed, and persistent storage. Indexes of mapping are key words of

rows, key words of columns, and timestamps, and every value in mapping is an

unanalyzed byte array. The key words of rows in BigTable are 64KB character

strings, in which the rows are stored according to the lexicographical order and

are continually segmented into Tablets, i.e. units of distribution and load balance.

This way, read a short row of data can be highly effective, since it only involves

communication with a small portion of machines. The columns are grouped

according to the prefixes of key words, which are called column families. These

column families are the basic units for access control. The timestamps are 64-

bit integers to distinguish different editions of cell values. Clients may flexibly

determine the quantity of cell editions to be stored. These editions are sequenced

in the descending order of timestamps, so the latest edition will always be read.

The BigTable API features the creation and deletion of Tablets and column

families as well as modification of metadata of clusters, tables, and column families,

and access control rights. Client applications may write or delete values of BigTable,

look up values from columns, or browse sub-datasets in a table. BigTable also

supports some other characteristics, such as transaction processing in a single row.

Users may utilize such features to conduct more complex processing on data.

BigTable is based on many fundamental components of Google, including

GFS [ 5 ], cluster management system, SSTable file format, and Chubby [ 11 ]. GFS is

use to store data and log files. The cluster management system is responsible for task

scheduling, management of shared resources in machines, processing of machine

failures, and monitoring of machine statuses. SSTable file format is used to store

BigTable data internally. SStable provides mapping between persistent, sequenced,

and unchangeable key words and values, with key words and values of any byte

strings. BigTable utilizes Chubby for the following server tasks: (1) ensure there

is at most one active Master copy at any time; (2) store the bootstrap location of

Search WWH ::

Custom Search

Home