Database Reference
In-Depth Information
with one another. These data silos can also hamper the ability of data solutions to
scale. For example, it is practical for accountants to work with spreadsheets, the Web
site development team to build their applications using relational databases, and finan-
cial to use a variety of statistics packages and visualization tools. In these situations, it
can become difficult to ask questions about the data across the variety of software used
throughout the company. For example, answering a question such as “how many of
our online customers have found our product through our social media networks, and
how much do we expect this number to increase if we improved our online advertis-
ing?” would require information from each of these silos.
Indeed, whenever you move from one database paradigm to another, there is an
inherent, and often unknown, cost. A simple example might be the process of mov-
ing from a relational database to a key-value database. Already managed data must be
migrated, software must be installed, and new engineering skills must be developed.
Making smart choices at the beginning of the design process may mitigate these prob-
lems. In Chapter 3, “Building a NoSQL-Based Web App to Collect Crowd-Sourced
Data,” we will discuss the process of using a NoSQL database to build an application
that expects a high level of volume from users.
A common theme that you will find throughout this topic is use cases that involve
using a collection of technologies that deal with issues of scale. One technology may
be useful for collecting, another for archiving, and yet another for high-speed analysis.
Build Systems That Can Share Data (On the Internet)
For public data to be useful, it must be accessible. The technological choices made
during the design of systems to deliver this data depends completely on the intended
audience. Consider the task of a government making public data more accessible to
citizens. In order to make data as accessible as possible, data files should be hosted on
a scalable system that can handle many users at once. Data formats should be chosen
that are easily accessible by researchers and from which it is easy to generate reports.
Perhaps an API should be created to enable developers to query data programmatically.
And, of course, it is most advantageous to build a Web-based dashboard to enable ask-
ing questions about data without having to do any processing. In other words, making
data truly accessible to a public audience takes more effort than simply uploading a
collection of XML files to a privately run server. Unfortunately, this type of “solution”
still happens more often than it should. Systems should be designed to share data with
the intended audience.
This concept extends to the private sphere as well. In order for organizations to
take advantage of the data they have, employees must be able to ask questions them-
selves. In the past, many organizations chose a data warehouse solution in an attempt
to merge everything into a single, manageable space. Now, the concept of becoming a
data-driven organization might include simply keeping data in whatever silo is the best
fit for the use case and building tools that can glue different systems together. In this
case, the focus is more on keeping data where it works best and finding ways to share
and process it when the need arises.
 
Search WWH ::




Custom Search