Four Rules for Data Success - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

This rethinking of the database for an era of cheap commodity hardware and the

rise of Internet-connected applications has resulted in an explosion of design philoso-

phies for data processing software.

If you are working on providing solutions to your organization's data challenges,

the current era is the Era of the Big Data Trade-Off. Developers building new data-

driven applications are faced with all manner of design choices. Which database back-

end should be used: relational, key-value, or something else? Should my organization

build it, or should we buy it? How much is this software solution worth to me? Once I

collect all of this data, how will I analyze, share, and visualize it?

In practice, a successful data pipeline makes use of a number of different technolo-

gies optimized for particular use cases. For example, the relational database model is

excellent for data that monitors transactions and focuses on data consistency. This is

not to say that it is impossible for a relational database to be used in a distributed envi-

ronment, but once that threshold has been reached, it may be more efficient to use a

database that is designed from the beginning to be used in distributed environments.

The use cases in this topic will help illustrate common examples in order to help

the reader identify and choose the technologies that best fit a particular use case. The

revolution in data accessibility is just beginning. Although this topic doesn't aim to

cover every available piece of data technology, it does aim to capture the broad use

cases and help guide users toward good data strategies.

More importantly, this topic attempts to create a framework for making good deci-

sions when faced with data challenges. At the heart of this are several key principles to

keep in mind. Let's explore these Four Rules for Data Success.

Build Solutions That Scale (Toward Infinity)

I've lost count of the number of people I've met that have told me about how they've

started looking at new technology for data processing because their relational database

has reached the limits of scale. A common pattern for Web application developers is

to start developing a project using a single machine installation of a relational database

for collecting, serving, and querying data. This is often the quickest way to develop

an application, but it can cause trouble when the application becomes very popular

or becomes overwhelmed with data and traffic to the point at which it is no longer

acceptably performant.

There is nothing inherently wrong with attempting to scale up a relational database

using a well-thought-out sharding strategy. Sometimes, choosing a particular technol-

ogy is a matter of cost or personnel; if your engineers are experts at sharding a MySQL

database across a huge number of machines, then it may be cheaper overall to stick

with MySQL than to rebuild using a database designed for distributed networks. The

point is to be aware of the limitations of your current solution, understand when a

scaling limit has been reached, and have a plan to grow in case of bottlenecks.

This lesson also applies to organizations that are faced with the challenge of hav-

ing data managed by different types of software that can't easily communicate or share

Search WWH ::

Custom Search

Home