Information Technology Reference
In-Depth Information
Fig. 3. A schematic of the Stream Tap User Interface.
data stream. For example, if a user wanted to tap a data stream of news stories,
and only wanted to know of new events in the Middle East, they may restrict the
data stream to news stories that contain specific keywords in the title. To help
with this step, an exempla from the data stream is presented, and the user may
utilize the filtering capabilities of the CBMS to restrict the stream to a more
manageable level. In some cases, for example when classifying stream content
into concept bins, no reduction is performed. The next step is to classify the
tokens flowing in the data stream. This step allows the system to present the
early results of data reduction and gives the user the opportunity to revise their
reduction criteria. Again, elements of the user interface are specific to the task
in hand. From a clustering perspective this stage would include a breakdown of
what clusters have formed. From a change detection standpoint, the top 'n' top-
ics could be presented. The next step allows the user to define what elements of
the process they wish to be notified about. For example, for the clustering task,
this may be a notification when 'n' clusters have formed, or when the average
number of items per cluster reaches a threshold. Similarly, for change detection,
this may be a notification when the highest rated topic is replaced, or when
a new topic enters the top ten. The final two steps define system and report
settings. These are defined globally for a user but may be overridden at these
stages if required.
The physical architecture of the data stream engine is shown in Fig. 4. We
rely on open-source components and utilize J2EE to allow for bean processing
of individual tokens.
For each stream that is added to our system, a XML stream format descriptor
depicts the stream content. An ingestor uses this information to extract tokens
from the stream and present them to the CBMS. The tokens that are allowed
to pass through the CBMS are passed to a set of algorithm beans, specific to
the task in hand. A stream profile bean is responsible for describing the current
state of the stream flow and through interactions with the monitor bean, a
decision is made as and when to notify the user of an event. This action is
Search WWH ::




Custom Search