Scientific Discovery Within Data Streams - Ambient Intelligence for Scientific Discovery

Information Technology Reference

In-Depth Information

the system software), this allows a mere 53 hours of local video storage or 1600

hours of radio. This example of potential information overload provides some

support for the argument that one-look stream processing is a desirable capabil-

ity for the desktop system. Being selective and only choosing elements of interest

drastically reduces the burden.

Data-streams are essential parts of the scientific discipline, intrinsic to a

growing number of research areas. 'Knowledge workers' who find themselves

working in these areas are in dire need of tools that help them deal with the

substantial volume and high complexity that modern streamed data possesses.

Our goal in this paper is to put forth a concept for a future generation analytical

environment for scientific discovery within data-streams. We take a component

approach, utilizing prototypical elements designed under different auspices for

diverse purposes and describe an approach to bring them together to provide

an ambient environment for collaborative study and a platform for data-stream

research.

First, we discuss our stream-processing component, designed as a first step

in marshalling complex data streams into something actionable.

2 Stream Processing Component

The concept of our stream-processing engine (shown in Fig. 1) is to provide the

ability to drastically reduce incoming data-streams to more manageable levels

by allowing the knowledge worker to implicitly define filters that restrict content

to only those tokens of intellectual value. We utilize a Content-based Messag-

ing System (CBMS) [2], an optimized J2EE Java Messaging Service (JMS) that

provides highly ecient message formatting and filtering. In our prototype, this

reduced stream is then routed through a set of algorithms that produce signa-

tures (a compressed representation of the original token). A signature expresses

the semantic content of the data sub-stream it encodes with reference to top-

ics that are discovered through an unsupervised classification model. Such a

classification model is augmented with a process of ontological annotation that

identifies relevant entities and relations among them in terms of reference generic

and domain specific ontologies. The topics, entities and relations discovered are

then utilized to provide users with an information rich visualization of the data

stream.

As signatures are generated, they are consumed into a descriptive profile (a

representation of the status quo of the reduced stream). On a token-by-token

basis, the profile may grow or remain the same depending on what that particular

signature adds to the current knowledge of this stream. After a user-defined

training period, new signatures from arriving tokens in the stream are compared

against the profile and evaluated for novel content that the knowledge worker

may be interested in.

One of the advantages of this approach is that, depending on the require-

ments of the user, different sets of algorithms may be used to perform different

actions. For example, if the user wants to monitor a data stream for new, novel

content (as described above), a change detection mode of operation is selected.

Ambient Intelligence for Scientific Discovery

Search WWH ::

Custom Search

Home