Database Reference
In-Depth Information
The actual process of cleaning may be much more complicated in an
application containing thousands of readers and millions of tags. In such
a case, the process of cleaning may incur tremendous costs associated
with the entire process. These costs may be associated with either the
cleaning plan itself, or in the misclassification associated with the cleaned
data records. Therefore, a method for cost-conscious cleaning of massive
RFID data sets has been proposed in [27].
The work in [27] assumes that three different kinds of inputs are avail-
able:
A set of tag readings are available, which form a representative
sample of the possible set of readings. Each reading is associated
with a correct location of the tag, contextual information, area
conditions, and tag protocol.
A set of cleaning methods with associated per-tuple cleaning costs
are specified.
A per-tuple mis-classification cost is specified, which may be con-
stant, a function of the tag reading and incorrectly assigned loca-
tion.
The goal of the cost-sensitive approach is to learn a cleaning plan that
identifies the conditions (feature values) under which a specific cleaning
method or a sequence of cleaning methods should be applied in order
to minimize the expected cleaning costs, including error costs. The
work in [27] proposes a cleaning method which dynamically adjusts the
probability of tag-presence based on the last observation. This is essen-
tially a Dynamic Bayesian Network (DBN) approach. It has been shown
in [27] that such an approach can outperform or complement methods
which are based on smoothing windows. One advantage of DBN-based
cleaning is that it does not require the use of recent tag readings (as in
a window-based method), and it also gives more importance to recent
readings, since the probability of tag-presence is continuously adjusted
by the incoming tag readings.
A method called StreamClean has been proposed in [46], which uses
global integrity constraints in order to clean the data. The core idea in
StreamClean is that the tuples in a data stream system are not random,
but are often related to one another, according to application-specific
criteria. An example of such an integrity constraint provided in [46] can
be as follows:
A car parked in the garage at time t e <t must either have exited in
( t e ,t ) ,oritmuststillbeparkedattime t .
In essence, the approach in StreamClean requires the specification of
Search WWH ::




Custom Search