Databases Reference
In-Depth Information
but simple. In fact, several factors make it a major challenge. First and foremost,
the volume or amount of data that companies have is massive and growing all
the time. Walmart estimates that its data warehouse (a type of database we will
explore later) alone contains hundreds of terabytes (trillions of characters) of data
and is constantly growing. The number of people who want access to the data is
also growing: at one time, only a select group of a company's own employees were
concerned with retrieving its data, but this has changed. Now, not only do vastly
more of a company's employees demand access to the company's data but also so
do the company's customers and trading partners. All major banks today give their
depositors Internet access to their accounts. Increasingly tightly linked ''supply
chains'' require that companies provide other companies, such as their suppliers and
customers, with access to their data. The combination of huge volumes of data and
large numbers of people demanding access to it has created a major performance
challenge. How do you sift through so much data for so many people and give them
the data that they want in an acceptably small amount of time? How much patience
would you have with an insurance company that kept you on the phone for five or
ten minutes while it retrieved claim data about which you had a question? Of course,
the tremendous advances in computer hardware, including data storage hardware,
have helped—indeed, it would have been impossible to have gone as far as we have
in information systems without them. But as the hardware continues to improve,
the volumes of data and the number of people who want access to it also increase,
making it a continuing struggle to provide them with acceptable response times.
Other factors that enter into data storage and retrieval include data security,
data privacy, and backup and recovery. Data security involves a company protecting
its data from theft, malicious destruction, deliberate attempts to make phony changes
to the data (e.g. someone trying to increase his own bank account balance), and even
accidental damage by the company's own employees. Data privacy implies assuring
that even employees who normally have access to the company's data (much less
outsiders) are given access only to the specific data they need in their work. Put
another way, sensitive data such as employee salary data and personal customer
data should be accessible only by employees whose job functions require it. Backup
and recovery means the ability to reconstruct data if it is lost or corrupted, say in
a hardware failure. The extreme case of backup and recovery is known as disaster
recovery when an information system is destroyed by fire, a hurricane, or other
calamity.
Another whole dimension involves maintaining the accuracy of a company's
data. Historically, and in many cases even today, the same data is stored several,
sometimes many, times within a company's information system. Why does this
happen? For several reasons. Many companies are simply not organized to share
data among multiple applications. Every time a new application is written, new data
files are created to store its data. As recently as the early 1990s, I spoke to a database
administration manager (more on this type of position later) in the securities industry
who told me that one of the reasons he was hired was to reduce duplicate data
appearing in as many as 60-70 files! Furthermore, depending on how database files
are designed, data can even be duplicated within a single file. We will explore this
issue much more in this topic, but for now, suffice it to say that duplicate data, either
in multiple files or in a single file, can cause major data accuracy problems.
Data as a Corporate Resource
Every corporate resource must be carefully managed so that the company can
keep track of it, protect it, and distribute it to those people and purposes in the
Search WWH ::




Custom Search