Information Technology Reference
In-Depth Information
the system software), this allows a mere 53 hours of local video storage or 1600
hours of radio. This example of potential information overload provides some
support for the argument that one-look stream processing is a desirable capabil-
ity for the desktop system. Being selective and only choosing elements of interest
drastically reduces the burden.
Data-streams are essential parts of the scientific discipline, intrinsic to a
growing number of research areas. 'Knowledge workers' who find themselves
working in these areas are in dire need of tools that help them deal with the
substantial volume and high complexity that modern streamed data possesses.
Our goal in this paper is to put forth a concept for a future generation analytical
environment for scientific discovery within data-streams. We take a component
approach, utilizing prototypical elements designed under different auspices for
diverse purposes and describe an approach to bring them together to provide
an ambient environment for collaborative study and a platform for data-stream
research.
First, we discuss our stream-processing component, designed as a first step
in marshalling complex data streams into something actionable.
2 Stream Processing Component
The concept of our stream-processing engine (shown in Fig. 1) is to provide the
ability to drastically reduce incoming data-streams to more manageable levels
by allowing the knowledge worker to implicitly define filters that restrict content
to only those tokens of intellectual value. We utilize a Content-based Messag-
ing System (CBMS) [2], an optimized J2EE Java Messaging Service (JMS) that
provides highly ecient message formatting and filtering. In our prototype, this
reduced stream is then routed through a set of algorithms that produce signa-
tures (a compressed representation of the original token). A signature expresses
the semantic content of the data sub-stream it encodes with reference to top-
ics that are discovered through an unsupervised classification model. Such a
classification model is augmented with a process of ontological annotation that
identifies relevant entities and relations among them in terms of reference generic
and domain specific ontologies. The topics, entities and relations discovered are
then utilized to provide users with an information rich visualization of the data
stream.
As signatures are generated, they are consumed into a descriptive profile (a
representation of the status quo of the reduced stream). On a token-by-token
basis, the profile may grow or remain the same depending on what that particular
signature adds to the current knowledge of this stream. After a user-defined
training period, new signatures from arriving tokens in the stream are compared
against the profile and evaluated for novel content that the knowledge worker
may be interested in.
One of the advantages of this approach is that, depending on the require-
ments of the user, different sets of algorithms may be used to perform different
actions. For example, if the user wants to monitor a data stream for new, novel
content (as described above), a change detection mode of operation is selected.
Search WWH ::




Custom Search