Database Reference
In-Depth Information
Questions need to be considered in this step
include: which information we need to extract?
Why we need to collect the information? How the
collected data can work with the data warehousing
and mining tasks and process? After the needed
information is extracted and collected from the
data transportation package, some Meta data in-
formation regarding the application domain can be
gathered from the end user or from the collected
information. For example, the data description of
the application domain, the data ranges, and data
type, etc. All these associated information can be
loaded to the framework after being transferred
to the required format.
transformed data is loaded into the processing
system.
Things need to be considered in this step are:
If the data coming rate can be kept with the data
processing rate? The speed of the mining system
should be faster than the data coming rate, oth-
erwise data approximation techniques, such as
sampling and load shedding, need to be applied.
If so, the questions needed to be considered in-
clude: Which kind of speed adjusting techniques
are suitable to use? How it will affect the accu-
racy of the mining result? What is the maximum
allowable error rate specified by the end user?
How we can apply the selected technique to the
application system?
Data Transformation
Data Storage and Processing Issues
In the data transformation step, data is transformed
to the formats that are able to be processed by the
processing unit. The Meta data is used to generate
indicator flags which reflect the data's effective-
ness, i.e. missing, corrupted or not. Each set of
information from a particular sensor is connected
together with the sensor's identifier, so that it is
easier to identify the sensor relationships and per-
form data analysis. More of the detail information
will be discussed in the framework section.
Questions need to be considered in this step
include: What are the source data formats? What
are the target data formats? How to perform the
data transformation? What kind of data can be
regarded as corrupted? How the transformed data
can work with the data warehousing and mining
task and process? What is the Meta data? What
are the usage and constraints of the data? After
the data transformation task is complete, all the
information can be loaded to the processing system
and is ready to be processed.
After the data is being preprocessed, the next
fundamental issue we need to consider is how
to optimize the storage and process the collected
information to perform the data warehousing and
mining tasks. We will discuss the related issues
in the following subsections.
One important thing in the data stream ap-
plication domain different from the traditional
application domains is that due to the continuous,
unbounded, high speed characteristics of data
streams, there is a huge amount of data in both
offline and online data stream applications. Thus,
there is not enough space to store and accumulate
all the stream data and wait for bulk offline pro-
cessing as in traditional database applications. One
scan processing of data and compact data storage
structure are preferable in this environment.
Data Storage
In this stage, data is stored in different data struc-
tures associated with the relative data warehous-
ing and mining tasks. Efficient and compact data
structure is needed to store, update and retrieve the
collected information. This is done to the bounded
Data Loading
Data loading is the last step of the data prepro-
cessing stage. In this step, the extracted and
Search WWH ::




Custom Search