Information Technology Reference
In-Depth Information
others do not. Due to the volume of data, after tokens are processed they are
typically discarded and cannot (usually) be retrieved for further analysis (i.e.
they are non-persistent). Finally, data streams usually carry data of a known
structure. Transmitters of stream data may insert data that does not subscribe
to the agreed format, and the stream mechanism may carry these 'false' tokens
but without a receiver to process them, they simply become noise. While our
work on streaming data is aimed at covering multiple media types (VAST: video,
audio, still image and text), our initial focus is on textual information, abridged
from structured or unstructured documents.
Many examples of streaming data can be found on the internet:
- Real-time raw trac loop sensor data: At http://128.95.29.3:8411, self-de-
scribing data from the Seattle metropolitan area highway system may be
viewed in a browser. This information is processed to display the area's trac
flow map available at http://www.wsdot.wa.gov/PugetSoundTrac/.
- Real-time stock quotes: At http://www.pcquote.com, clicking on the 'stock
ticker' button brings up a streaming ticker of stock prices and averages at
the top of your browser.
- Audio news feeds: At http://www.npr.org/audiohelp/progstream.html,
clicking on the NPR audio online button, delivers a streaming news broad-
cast, available via RealAudio and other audio players.
- Video news feeds: At http://www.cnn.com/video/, several video clips are
available to review the daily news. Although these are not continuous streams
of video, they are representative of the type of video information becoming
available in streaming format.
Browsers and other applications are able to process and present this infor-
mation in formats that are often very convenient to the casual user. Yet, data
streams represent a significant challenge to information analysts wishing to use
information in more complex ways. To look at potential future issues with re-
spect to streaming data, Lyman and Varian [1] examined the data flows associ-
ated with two common information broadcast media: television and radio. They
used the CIA World Factbook to find that there are 33,071 television stations in
the world. Assuming these stations broadcast about 16 hours per day, this would
equal about 193 million hours total programming. They estimated that 25% of
the programs were original, leading to a figure of 48 million hours each year.
Using the low end of their storage estimates that one-hour of video requires 1.3
GB of storage, then worldwide, program storage would be about 63,000 TB. For
radio, they estimated that FM radio stations broadcast 20 hours per day, AM
stations 16 hours per day, and shortwave stations 12 hours per day. Therefore,
they estimated that there is approximately 290 million hours (188 million FM,
98 million AM, and 6 million shortwave) of radio programming per year. Apply-
ing a 50 MB/hour rule of thumb, we come to an estimate of the annual storage
requirement of about 14,500 TB if one were to record everything broadcast on
radio. At the time of writing, a typical desktop workstation comes equipped
with an 80 GB hard disk. Assuming all disk space is available for storage (an
incorrect, but simplifying assumption, as some space needs to be used to hold
Search WWH ::




Custom Search