Databases Reference
In-Depth Information
samples per second, 16 bits per sample) requires more than 84 million bits. Downloading
music from a website at these rates would take a long time.
As human activity has a greater and greater impact on our environment, there is an ever-
increasing need for more information about our environment, how it functions, and what we
are doing to it. Various space agencies from around the world, including the European Space
Agency (ESA), the National Aeronautics and Space Administration (NASA), the Canadian
Space Agency (CSA), and the Japan Aerospace Exploration Agency (JAXA), are collaborating
on a program to monitor global change that will generate half a terabyte of data per day when it
is fully operational. New sequencing technology is resulting in ever-increasing database sizes
containing genomic information while new medical scanning technologies could result in the
generation of petabytes 1 of data.
Given the explosive growth of data that needs to be transmitted and stored, why not focus
on developing better transmission and storage technologies? This is happening, but it is
not enough. There have been significant advances that permit larger and larger volumes of
information to be stored and transmitted without using compression, including CD-ROMs,
optical fibers, Asymmetric Digital Subscriber Lines (ADSL), and cable modems. However,
while it is true that both storage and transmission capacities are steadily increasing with new
technological innovations, as a corollary to Parkinson's First Law, 2 it seems that the need
for mass storage and transmission increases at least twice as fast as storage and transmission
capacities improve. Then there are situations in which capacity has not increased significantly.
For example, the amount of information we can transmit over the airwaves will always be
limited by the characteristics of the atmosphere.
An early example of data compression is Morse code, developed by Samuel Morse in the
mid-19th century. Letters sent by telegraph are encoded with dots and dashes. Morse noticed
that certain letters occurred more often than others. In order to reduce the average time required
to send a message, he assigned shorter sequences to letters that occur more frequently, such as
e (
·
·−
−−·−
) and a (
), and longer sequences to letters that occur less frequently, such as q (
)
and j (
). This idea of using shorter codes for more frequently occurring characters is
used in Huffman coding, which we will describe in Chapter 3.
Where Morse code uses the frequency of occurrence of single characters, a widely used
form of Braille code, which was also developed in the mid-19th century, uses the frequency
of occurrence of words to provide compression [ 1 ]. In Braille coding, 2
·−−−
3 arrays of dots are
used to represent text. Different letters can be represented depending on whether the dots are
raised or flat. In Grade 1 Braille, each array of six dots represents a single character. However,
given six dots with two positions for each dot, we can obtain 2 6 , or 64, different combinations.
If we use 26 of these for the different letters, we have 38 combinations left. In Grade 2 Braille,
some of these leftover combinations are used to represent words that occur frequently, such
as “and” and “for.” One of the combinations is used as a special symbol indicating that the
symbol that follows is a word and not a character, thus allowing a large number of words to be
×
1 mega: 10 6 , giga: 10 9 , tera: 10 12 , peta: 10 15 ,exa:10 18 , zetta: 10 21 , yotta: 10 24
2 Parkinson's First Law: “Work expands so as to fill the time available,” in Parkinson's Law and Other Studies in
Administration , by Cyril Northcote Parkinson, Ballantine Books, New York, 1957.
Search WWH ::




Custom Search