Four Rules for Data Success - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

with one another. These data silos can also hamper the ability of data solutions to

scale. For example, it is practical for accountants to work with spreadsheets, the Web

site development team to build their applications using relational databases, and finan-

cial to use a variety of statistics packages and visualization tools. In these situations, it

can become difficult to ask questions about the data across the variety of software used

throughout the company. For example, answering a question such as “how many of

our online customers have found our product through our social media networks, and

how much do we expect this number to increase if we improved our online advertis-

ing?” would require information from each of these silos.

Indeed, whenever you move from one database paradigm to another, there is an

inherent, and often unknown, cost. A simple example might be the process of mov-

ing from a relational database to a key-value database. Already managed data must be

migrated, software must be installed, and new engineering skills must be developed.

Making smart choices at the beginning of the design process may mitigate these prob-

lems. In Chapter 3, “Building a NoSQL-Based Web App to Collect Crowd-Sourced

Data,” we will discuss the process of using a NoSQL database to build an application

that expects a high level of volume from users.

A common theme that you will find throughout this topic is use cases that involve

using a collection of technologies that deal with issues of scale. One technology may

be useful for collecting, another for archiving, and yet another for high-speed analysis.

Build Systems That Can Share Data (On the Internet)

For public data to be useful, it must be accessible. The technological choices made

during the design of systems to deliver this data depends completely on the intended

audience. Consider the task of a government making public data more accessible to

citizens. In order to make data as accessible as possible, data files should be hosted on

a scalable system that can handle many users at once. Data formats should be chosen

that are easily accessible by researchers and from which it is easy to generate reports.

Perhaps an API should be created to enable developers to query data programmatically.

And, of course, it is most advantageous to build a Web-based dashboard to enable ask-

ing questions about data without having to do any processing. In other words, making

data truly accessible to a public audience takes more effort than simply uploading a

collection of XML files to a privately run server. Unfortunately, this type of “solution”

still happens more often than it should. Systems should be designed to share data with

the intended audience.

This concept extends to the private sphere as well. In order for organizations to

take advantage of the data they have, employees must be able to ask questions them-

selves. In the past, many organizations chose a data warehouse solution in an attempt

to merge everything into a single, manageable space. Now, the concept of becoming a

data-driven organization might include simply keeping data in whatever silo is the best

fit for the use case and building tools that can glue different systems together. In this

case, the focus is more on keeping data where it works best and finding ways to share

and process it when the need arises.

Search WWH ::

Custom Search

Home