Database Reference
In-Depth Information
problem—one in which a disparity between two physical information stores blocked
easy interoperability between the data source and the data processing system. Luhn had
imagined that a growth in the adoption of analog to digital bridge technology, such as
typewriters with simultaneous ticker tape printout, could eliminate this disparity.
Although the dream of a truly paperless office has been a common meme through-
out the past few decades (paper use actually doubled between 1980 and 2000 3 ), casual
business information is rapidly being generated digitally. Unstructured but useful
sources of information are becoming the norm as consumer and business platforms
converge. Emails, customer reviews, tweets, and user groups can be both valuable
sources of “business information” and a nightmare to search, store, and query. Some
business applications are mirroring features found in social media, enabling employ-
ees to stream posts of data to company-wide social media platforms. As a result, data
becomes more fractured, more unstructured, and simply more , period.
The Problem in Practice
Let's look at an example of a typical data silo challenge. Customers generate data of
all kinds, and this generated data is difficult to control. A customer might report her
location as “California” in one transaction and use the abbreviation “CA” in another.
Customers also generate support questions, post messages on your company's Facebook
page, send emails, and typically do everything in their power to ensure that whatever
ideal data model you want to conform to will be disregarded. Dealing with customer
data can be difficult enough, but what about all the other data required to understand
your business? This might include data from your product inventory, human resources,
advertising, finance, and any number of applications crucial to business decisions.
In order to make any sense of this data, it must often be cleaned, or transformed
into a more normalized form. Erroneous data must be corrected or discarded, and
dates must all be converted into the same format. More importantly, if data sets are
to be joined in any meaningful way, common keys must be available. In other words,
a user's ID must be the exact same value in the purchase database as it is in your cus-
tomer support logs.
Once this data is processed, it must be stored in a way that enables users to ask
questions about their data. Query results from this data can then be brought into visu-
alization tools or moved into spreadsheets for further analysis. All modern organiza-
tions, big or small, deal with data in some way, and each of these steps can become
daunting if data sizes are large and data sources are disparate.
I once worked at a small nonprofit, and we faced the same types of data issues as
any large corporation. Our donor database was implemented using a relational database
hosted on a single machine. We also had a Web-based system for online donations,
which collected names and addresses, and this information was stored in a separate
relational database. We made extra money by selling topics and CDs, the inventory
3. www.economist.com/node/12381449
 
Search WWH ::




Custom Search