A Framework for Data Warehousing and Mining in Sensor Stream Application Domains - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

Questions need to be considered in this step

include: which information we need to extract?

Why we need to collect the information? How the

collected data can work with the data warehousing

and mining tasks and process? After the needed

information is extracted and collected from the

data transportation package, some Meta data in-

formation regarding the application domain can be

gathered from the end user or from the collected

information. For example, the data description of

the application domain, the data ranges, and data

type, etc. All these associated information can be

loaded to the framework after being transferred

to the required format.

transformed data is loaded into the processing

system.

Things need to be considered in this step are:

If the data coming rate can be kept with the data

processing rate? The speed of the mining system

should be faster than the data coming rate, oth-

erwise data approximation techniques, such as

sampling and load shedding, need to be applied.

If so, the questions needed to be considered in-

clude: Which kind of speed adjusting techniques

are suitable to use? How it will affect the accu-

racy of the mining result? What is the maximum

allowable error rate specified by the end user?

How we can apply the selected technique to the

application system?

Data Transformation

Data Storage and Processing Issues

In the data transformation step, data is transformed

to the formats that are able to be processed by the

processing unit. The Meta data is used to generate

indicator flags which reflect the data's effective-

ness, i.e. missing, corrupted or not. Each set of

information from a particular sensor is connected

together with the sensor's identifier, so that it is

easier to identify the sensor relationships and per-

form data analysis. More of the detail information

will be discussed in the framework section.

Questions need to be considered in this step

include: What are the source data formats? What

are the target data formats? How to perform the

data transformation? What kind of data can be

regarded as corrupted? How the transformed data

can work with the data warehousing and mining

task and process? What is the Meta data? What

are the usage and constraints of the data? After

the data transformation task is complete, all the

information can be loaded to the processing system

and is ready to be processed.

After the data is being preprocessed, the next

fundamental issue we need to consider is how

to optimize the storage and process the collected

information to perform the data warehousing and

mining tasks. We will discuss the related issues

in the following subsections.

One important thing in the data stream ap-

plication domain different from the traditional

application domains is that due to the continuous,

unbounded, high speed characteristics of data

streams, there is a huge amount of data in both

offline and online data stream applications. Thus,

there is not enough space to store and accumulate

all the stream data and wait for bulk offline pro-

cessing as in traditional database applications. One

scan processing of data and compact data storage

structure are preferable in this environment.

Data Storage

In this stage, data is stored in different data struc-

tures associated with the relative data warehous-

ing and mining tasks. Efficient and compact data

structure is needed to store, update and retrieve the

collected information. This is done to the bounded

Data Loading

Data loading is the last step of the data prepro-

cessing stage. In this step, the extracted and

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home