Emerging Database Landscape - Big Data Imperatives

Databases Reference

In-Depth Information

In the shared-nothing model each server contains a portion of the database, and

no server contains the entire database. It is designed to process as much data possible at

each node and share data between nodes only when necessary. Although the database

runs independently on multiple nodes, it appears as a single entity to any application.

This model resolves the core limitation of I/O bottlenecks facing single and clustered

servers. Adding a node to a shared-nothing database increases the processors and memory

available and, more importantly, the disk bandwidth as well. A group of small servers can

easily outstrip the total I/O throughput of a very large server or shared disk cluster.

Scaling in this way also lowers the overall hardware cost because commodity servers

can be used. A collection of small servers with the same total amount of processors,

memory, and storage is less expensive than a single large server. See Table 4-4 . We've spent

a good deal of time discussing database evolution and various database technologies

suitable for different type of workloads. Below are a number of conclusions regarding

database architectures:

Table 4-4. Scale up and scale out considerations

Scaling up a Database Platform

Scale Up

Scale Out

Vertical expansion/Upgrade to more

powerful server configuration

Horizontal expansion through

a grid or cluster of commodity servers

More expensive hardware

Less expensive hardware

Eventually hits a limit

Less likely to hit a limit

•

RDBMS databases based on the relational model still fit the need for

most database implementations, but they have reached scalability

limits, making them either impractical or too expensive for

specialized workloads. New entrants to the market and alternative

approaches are often better suited to specific workloads.

•

The relational database is still the preferred choice for most

applications today. Database preferences are changing,

particularly for new applications that have high scalability

requirements for data size or user concurrency. If you find

yourself working with a system that has specific needs, let the

workload be your primary guide.

•

When analyzing the workloads, be sure to consider all the

components. For example, if you run a consumer-facing website

on the database but also want to analyze data using machine-

learning algorithms, you are dealing with two distinct workloads.

One requires real-time read-write activity, and the other requires

heavy read-intensive and computational activity. These are

generally incompatible within the same database without careful

design considerations.

Big Data Imperatives

Search WWH ::

Custom Search

Home