A Framework for Data Warehousing and Mining in Sensor Stream Application Domains - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

Data Preprocessing Stage

Query Processing Stage

At the data preprocessing stage, the collected

raw data was first integrated into a single sensor

stream sequenced by their arrival time stamps.

It was then cleaned to round up the detected air

temperature to its nearest integer value which

represents the temperature at each sensor location

during the reported time interval. The cleaned data

was then enriched with the domain information;

in this case, the sensor identifier was integrated

with associated sensor values reported during

different time intervals. The Meta data is used to

generate indicator flags which reflect the data's

effectiveness, i.e. missing, corrupted or not. In

this case we define the data which no reporting

value is missing, and the data reporting value is

beyond the specified temperature range is cor-

rupted.After the preprocessing each sensor reading

is connected by a particular sensor identifier, a

sequence number represents the time interval the

sensor reading was reported, and the Meta data

contains the locations of each sensor represented

by each sensor identifier.

At the query processing stage, which is on the top

level of the domain-driven framework, different

users' query can be fulfilled at the users' speci-

fied query criteria at the same time. In this case,

the temperatures at different locations during the

specified time interval. If the query information is

not missing, it can be directly retrieved from the

sensor network database. Otherwise, the request

was send through the data estimation component

in the data warehousing and mining level and the

estimated results are retrieved for different end

users' requests.

Performance Study

Several different data mining techniques are con-

ducted in order to evaluate the proposed framework

using theAverage Window Size (AWS) approach,

the linear interpolation approach, the linear trend

approach, and the CARM approach (Jiang, 2007).

All these methods are applied to our proposed

framework to answer the user's request for missing

sensor air temperature value. We compared the

estimation accuracy, running time and memory

space usage when applying each method to our

proposed framework.

Data Warehousing and

Data Mining Stage

At the data warehousing and data mining stage,

which is on the third level of the domain-driven

framework, different data warehousing and data

mining tasks can be performed on the preprocessed

data enriched with domain information. In this

case, we perform data association mining task in

each sensor cluster from the Huntington Botani-

cal Garden sensor network application to find out

the interesting patterns and associations between

the sensor readings. We then use the discovered

relationships between these sensor readings to

perform missing sensor data estimation based

on the sensor readings related with the missing

sensor reading.

Performance Study of

Estimation Accuracy

The evaluation of the estimation accuracy of the

missing values is done by using the average Root

Mean Square Error (RMSE).

From Figure 8, we can see that CARM gives

the best result of the above approaches regarding

the estimation accuracy. The AWS, and linear

series approaches perform no better than CARM

approaches. The main reason might be that it only

considers the relationship between the neighbor

nodes, while CARM find out all of the relationships

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home