Database Reference
In-Depth Information
are guaranteed to be completely identical. However, CP could not ensure sound
availability because of the high cost for consistency assurance. Therefore, CP sys-
tems are useful for the scenarios with moderate load but stringent requirements on
data accuracy (e.g., trading data). BigTable and Hbase are two popular CP systems.
BigTable is well-known since it was successful for managing the background data
of Google's search engine. Because a lot of data in Google is structured data,
BigTable mainly stores data with tables. Nevertheless, when a lot of information
is put in a table, the table size will grow. Such information should be partitioned and
stored separately. The table is usually highly sparse. Therefore, BigTable divides
the columns into different Column Families, where every column family stores the
same type of information. This way, similar data is stored together and the same type
of information is processed in the same manner, making it easy for system users. In
the same column family, new columns can be arbitrarily inserted, thus reducing the
usage limit of BigTable to a great extent.
BigTable is designed in the way similar to GFS, a distributed file system of
Google, where one Master and several Tablet Servers constitute a star structure in
a system. The star structure has a single point of failure. The load of the Master
server should be reduced in order to minimize Master errors. In BigTable, data
transmission and data addressing do not involve the Master. Therefore the load of the
Master is not high. In order to solve the problem of a single point of failure, BigTable
adopts a Master election mechanism. In particular, it incorporates an asynchronous
and consistent locking mechanism to ensure that exact one Master is elected every
time based on the Paxos protocol [ 3 ].
Data in BigTable is sequenced in the lexicographic order of rows. During data
modification, we shall insert a record in a sequential table, find a position to be
inserted, and then move the original data to make room for the newly inserted
data. This operation is very time-consuming. BigTable utilizes batch processing to
solve this problem. Specifically, BigTable uses two tables to store data: it stores
historical data with a big table and stores recently modified data with a very small
table.when the recent data accumulates to a certain amount or after a certain amount
of time, BigTable merges the recent data into the historical data. This approach
greatly reduces the times that big tables are modified, since only small tables are
frequently modified. The cost of data modification is thus reduced to a great extent.
Therefore, this method mitigates the problem of high cost for data changes and
increases the look-up speed for recently modified data.
AP systems,also ensure partition tolerance. However, AP systems are different
from CP systems in that AP systems also ensure availability. However, AP systems
only ensure eventual consistency rather than strong consistency in the previous two
systems. Therefore, AP systems only apply to the scenarios with frequent requests
but not very high requirements on accuracy. For example, in online SNS (Social
Networking Services) systems, there are many concurrent visits to the data but
certain amount of data errors are tolerable. Furthermore, because AP systems ensure
eventual consistency, accurate data can still be obtained after a certain amount of
delay. Therefore, AP systems may also be used under the circumstances with no
stringent real-time requirements.
Search WWH ::




Custom Search