Databases Reference
In-Depth Information
Say you need to upgrade some RAM on one of your six servers, and it provides rep-
lication for a master node. You shut down the server, install the RAM , and restart the
server. While the slave server was down, additional transactions were processed and
now need to be replicated. Copying the entire dataset would be inefficient. Using
hash trees allows you to simply check what directories and files have new hash values
and synchronize those files—you're done.
As you've seen, distributed revision control systems are important in today's work
environment for database as well as software development scenarios. The ability to syn-
chronize data by reconnecting to a network and merging changes saves valuable time
and money for organizations and allows them to focus on other business concerns.
3.8
Apply your knowledge
Sally is working on a project that uses a NoSQL document database to store product
reviews for hundreds of thousands of products. Since products and product reviews
have many different types of attributes, Sally agrees that a document store is ideal for
storing this high-variability data. In addition, the business unit needs full-text search
capability also provided by the document store.
The business unit has come to Sally: they want to perform aggregate analysis on a
subset of all the properties that have been standardized across the product reviews.
The analysis needs to show total counts and averages for different categories of prod-
ucts. Sally has two choices. She can use the aggregate functions supplied by the
NoSQL document database or she can create a MapReduce job to summarize the data
and then use existing OLAP software to do the analysis.
Sally realizes that both options require about the same amount of programming
effort. But the OLAP solution allows more flexible ad hoc query analysis using a pivot-
table-like interface. She decides to use a MapReduce transform to create a fact table
and dimension tables, and then builds an OLAP cube from the star schema. In the
end, product managers can create ad hoc reports on product reviews using the same
tools they use for product sales.
This example shows that NoSQL systems may be ideal for some data tasks, but they
may not have the same features of a traditional table-centric OLAP system for some
analyses. Here, Sally combined parts of a new NoSQL approach with a traditional
OLAP tool to get the best of both worlds.
3.9
Summary
In this chapter, we reviewed many of the existing features of RDBMS s, as well as their
strengths and weaknesses. We looked at how relational databases use the concept of
joins between tables and the challenge this can present when scalability across multi-
ple systems is desired.
We reviewed how the large integration costs of siloed systems drove RDBMS ven-
dors to create larger centralized systems that allowed up-to-date integrated reporting
with fine-grained access control. We also reviewed how online analytical systems allow
Search WWH ::




Custom Search