Database Reference
In-Depth Information
This rethinking of the database for an era of cheap commodity hardware and the
rise of Internet-connected applications has resulted in an explosion of design philoso-
phies for data processing software.
If you are working on providing solutions to your organization's data challenges,
the current era is the Era of the Big Data Trade-Off. Developers building new data-
driven applications are faced with all manner of design choices. Which database back-
end should be used: relational, key-value, or something else? Should my organization
build it, or should we buy it? How much is this software solution worth to me? Once I
collect all of this data, how will I analyze, share, and visualize it?
In practice, a successful data pipeline makes use of a number of different technolo-
gies optimized for particular use cases. For example, the relational database model is
excellent for data that monitors transactions and focuses on data consistency. This is
not to say that it is impossible for a relational database to be used in a distributed envi-
ronment, but once that threshold has been reached, it may be more efficient to use a
database that is designed from the beginning to be used in distributed environments.
The use cases in this topic will help illustrate common examples in order to help
the reader identify and choose the technologies that best fit a particular use case. The
revolution in data accessibility is just beginning. Although this topic doesn't aim to
cover every available piece of data technology, it does aim to capture the broad use
cases and help guide users toward good data strategies.
More importantly, this topic attempts to create a framework for making good deci-
sions when faced with data challenges. At the heart of this are several key principles to
keep in mind. Let's explore these Four Rules for Data Success.
Build Solutions That Scale (Toward Infinity)
I've lost count of the number of people I've met that have told me about how they've
started looking at new technology for data processing because their relational database
has reached the limits of scale. A common pattern for Web application developers is
to start developing a project using a single machine installation of a relational database
for collecting, serving, and querying data. This is often the quickest way to develop
an application, but it can cause trouble when the application becomes very popular
or becomes overwhelmed with data and traffic to the point at which it is no longer
acceptably performant.
There is nothing inherently wrong with attempting to scale up a relational database
using a well-thought-out sharding strategy. Sometimes, choosing a particular technol-
ogy is a matter of cost or personnel; if your engineers are experts at sharding a MySQL
database across a huge number of machines, then it may be cheaper overall to stick
with MySQL than to rebuild using a database designed for distributed networks. The
point is to be aware of the limitations of your current solution, understand when a
scaling limit has been reached, and have a plan to grow in case of bottlenecks.
This lesson also applies to organizations that are faced with the challenge of hav-
ing data managed by different types of software that can't easily communicate or share
 
Search WWH ::




Custom Search