Database Reference
In-Depth Information
memory size and huge amounts of data stream
characteristics. Furthermore, the data structure
needs to be incrementally maintained since it is
not possible to rescan the entire input due to the
huge amount of stream data and requirement of
rapid online querying speed.
Questions need to be considered in this stage
include: What information we need to store to
process the data warehousing and mining tasks?
What data structures we use to store the data
and/or Meta data? How and when we update the
stored information? Is the data structure efficient
to perform the data mining and retrieving tasks?
The data storage stage is directly related with the
data processing stage, where the data warehousing
and mining tasks are performed.
system resources. All these three models can be
converted to one another. Choosing which kind
of data processing model to use largely depends
on the application needs.
The next issue need to be considered in this
stage is which data warehousing and data mining
methods are suitable in the application environ-
ment? A number of questions need to be con-
sidered: Should we use an exact or approximate
algorithm to perform the mining task? Can the
error rate be guaranteed if it is an approximate
algorithm? How to reduce and guarantee the er-
ror? What is the tradeoff between the accuracy
and processing speed? Is the data processed within
one pass? Can this method handle a large amount
of data? What are the mechanism to maintain and
update the data structure and mining results? Is
the method resource aware? Can the processing
methods handle timeline queries and/or multidi-
mensional stream data?
Data Processing
In the data processing stage, different data ware-
housing and mining tasks are performed based
on the user's query. This is the main stage of
the data processing system where different data
warehousing and data mining methodologies are
used to discovery knowledge or potential impor-
tant information.
The first issue need to be addressed in this
stage is the data processing model. According to
(Zhu, 2002), there are three stream data processing
models, Landmark, Damped and Sliding Win-
dows. The Landmark model mines all collected
information over the entire history of stream data
from a specific time point called landmark to
the present. The Damped model, also called the
Time-Fading model mines information in stream
data in which each transaction has a weight and
this weight decreases with age. Older transactions
contribute less weight towards the mining results.
The Sliding Windows model finds and maintains
most recent information in sliding windows. Only
part of the data streams within the sliding window
are stored and processed at the time when the
stream data flows in. The size of the sliding window
may be decided according to the applications and
Data Reporting and Analysis Issues
Data reporting and analysis is the last stage in the
sensor stream processing infrastructure, where
the mining results are monitored, analyzed and
reported. What monitoring and visualization
techniques and devices to use is largely applica-
tion dependent.
Questions need to be considered in this stage
include: What is the suitable layout and structure of
the reporting interface? How the end users set pa-
rameters in the queries? For example, from the pick
lists or drop down menus? What is the maximum
query response time requested by the end users?
What visualization techniques are appropriate to
use in a particular application domain?
After discussing the issues needed to be
considered in the data warehousing and mining
in sensor stream application domain, in the next
section we present a framework where the domain-
driven data warehousing and mining tasks are
used for knowledge discovery in sensor steam
applications.
Search WWH ::




Custom Search