Databases Reference
In-Depth Information
detect problems before they impact your customers. Cost-effective NoSQL sys-
tems can be part of good operations management solutions.
Full-text documents —This category of data includes any document that contains
natural-language text like the English language. An important aspect of docu-
ment stores is that you can query the entire contents of your office document in
the same way you would query rows in your SQL system.
This means that you can create new reports that combine traditional data in
RDBMS s as well as the data within your office documents. For example, you
could create a single query that extracted all the authors of titles of PowerPoint
slides that contained the keywords NoSQL or big data . The result of this list of
authors could then be filtered with a list of titles in the HR database to show
which people had the title of Data Architect or Solution Architect .
This is a good example of how organizations are trying to tap into the hid-
den skills that already exist within an organization for training and mentorship.
Integrating documents into what can be queried is opening new doors in
knowledge management and efficient staff utilization.
As you can see, you might encounter many different flavors of big data. As we move
forward, you'll see how using a shared-nothing architecture can help you with most of
your big data problems, whether they're read-mostly or read/write data.
6.5
Analyzing big data with a shared-nothing architecture
There are three ways that resources can be shared between computer systems: shared
RAM , shared disk, and shared-nothing. Figure 6.6 shows a comparison of these three
distributed computing architectures.
Of the three alternatives, a shared-nothing architecture is most cost effective in
terms of cost per processor when you're using commodity hardware. As we continue,
CPU
CPU
CPU
CPU
CPU
CPU
RAM
RAM
RAM
RAM
BUS
RAM
Disk
Disk
LAN
SAN
LAN
Shared disk
Shared RAM
Shared-nothing
Figure 6.6 Three ways to share resources. The left panel shows a
shared RAM architecture, where many CPUs access a single shared
RAM over a high-speed bus. This system is ideal for large graph
traversal. The middle panel shows a shared disk system, where
processors have independent RAM but share disk using a storage area
network (SAN). The right panel shows an architecture used in big data
solutions: cache-friendly, using low-cost commodity hardware, and a
shared-nothing architecture.
 
Search WWH ::




Custom Search