Database Reference
In-Depth Information
Figure 12-33
A Generalized Structured
Storage System
Name: LastName
Value: Able
Timestamp: 40324081235
(a) A Column
Super Column Name:
CustomerName
Name: FirstName
Name: LastName
Super Column Values:
Value: Ralph
Value: Able
Timestamp: 40324081235
Timestamp: 40324081235
(b) A Super Column
Column
Family
Name:
Customer
Name: FirstName
Name: LastName
RowKey001
Value: Ralph
Value: Able
Timestamp: 40324081235
Timestamp: 40324081235
Name: FirstName
Name: LastName
Name: Phone
Name: City
RowKey002
Value: Nancy
Value: Jacobs
Value: 817-871-8123
Value: Fort Worth
Timestamp: 40335091055
Timestamp: 40335091055
Timestamp: 40335091055
Timestamp: 40335091055
Name: LastName
Name: EmailAddress
RowKey003
Value: Baker
Value: Susan.Baker@elswhere.com
Timestamp: 40340103518
Timestamp: 40340103518
(C) A Column Family
Finally, all the column families are contained in a keyspace , which provides the set of
RowKey values that can be used in the data store. RowKey values from the keyspace are shown
being used in Figure 12-33(c) to identify each row in a column family. While this structure may
seem odd at first, in practice it allows for great flexibility because columns to contain new data
may be introduced at any time without modifying an existing table structure. Of course, there
is more to structured storage than discussed here, but now you should have an understanding
of the basic principles.
MapReduce
While structured storage provides the means to store data in a Big Data system, the data
itself are analyzed using the MapReduce process. Because Big Data involve extremely large
data sets, it is difficult for one computer to process data by itself. Therefore, a set of clustered
computers are used with a distributed processing system similar to the distributed database
system discussed previously in this chapter.
The MapReduce process is used to break a large analytical task into smaller tasks, assign
each smaller task to a separate computer in the cluster, gather the results of each of those
tasks, and combine them into the final product of the original tasks. The term Map refers to
the work done on each individual computer, and the term Reduce refers to combining the indi-
vidual results into the final result.
 
Search WWH ::




Custom Search