Information Technology Reference
In-Depth Information
A table is organized into rows and columns; columns can be grouped in col-
umn family, which allow for specific optimization for better access control,
the storage, and the indexing of data. A simple data access model constitutes
the interface for client applications that can address data at the granularity
level of the single column of a row. Moreover, each column value is stored in
multiple versions that can be automatically time stamped by BigTable or by
the client applications.
Google's BigTable solution's objective was to develop a rela-
tively simple storage management system that could provide
fast access to petabytes of data, potentially redundantly distrib-
uted across thousands of machines. Physically, BigTable resem-
bles a B-tree index-organized table in which branch and leaf nodes are
distributed across multiple machines. Like a B-tree, nodes split as
they grow, and—because nodes are distributed—this allows for high
scalability across large numbers of machines. Data elements in
BigTable are identified by a primary key, column name, and, option-
ally, a time stamp. Lookups via primary key are predictable and rela-
tively fast. BigTable provides the data storage mechanism for Google
App Engine.
Data are stored in BigTable as a sparse, distributed, persistent multidimen-
sional sorted map structure, which is indexed by a row key, column key,
and a time stamp. Rows in a BigTable are maintained in order by row key,
and row ranges become the unit of distribution and load balancing called a
tablet. Each cell of data in a BigTable can contain multiple instances indexed
by the time stamp. BigTable uses GFS to store both data and log files. The
API for BigTable is flexible, providing data management functions like
creating and deleting tables and data manipulation functions by row key
including operations to read, write, and modify data. Index information for
BigTables utilizes tablet information stored in structures similar to a B-tree.
MapReduce applications can be used with BigTable to process and transform
data, and Google has implemented many large-scale applications that utilize
BigTable for storage including Google Earth (Tables 17.1 and 17.2).
17. 3 Hadoop
Hadoop is an open-source software project sponsored by the Apache
Software Foundation. Following the publication in 2004 of the research
paper describing Google MapReduce, an effort was begun in conjunction
Search WWH ::




Custom Search