Database Reference
In-Depth Information
15.3.2 D ata h armonization l ayer
Data harmonization was an important step for data and knowledge integration. The
preprocessing of downloaded time series was an important feature due to the uncer-
tainty associated with data availability. The individual time series was identified
according to the name of the selected site and environmental variable. Data were
available from the time of deployment. As can be expected in the real-world net-
works, each of the available time series had periods with missing values. For some
sensor nodes, there were a number of Infinite values. Initially a filter was designed
to remove all of the Infinite values, and replace them with a “Not a Number” string
to keep the filtering statistically insignificant and the original time frame unaltered.
Data validation and preprocessing was conducted based on available knowledge from
the sensor and sensor network Ontologies. Preprocessed time series data were batch
processed and represented as the daily averaged data. Data from the different sources
measuring the same environmental attribute were harmonized and cross-validated
against each other. Again, different measured attributes from the same node were
also harmonized according to the daily average. This step helped the evaluation and
data visualization processes by reducing the number of data points and also solving
the issues related to different data logging frequencies. It also helped to compress the
data to certain extent without losing any daily observation characteristics. The final
outcome of this layer was to produce multisource-based environmental time series
data harmonized, unit converted if required, and semantically integrated in a single
structure on a daily scale.
15.3.3 s emantiC C ross -v aliDation l ayer
Semantics representations are usually intended as a medium for conveying the mean-
ing about some world or environment. A knowledge representation must therefore
have a semantic theory that provides an account in which a particular representation
corresponds to the external world or environment. Preprocessed data were cross-
validated using semantic metadata matching and statistical cross-correlation calcula-
tion. Metadata is “data about the data” and it can be provided the description of what,
where, who, and how about the data [5,53,61]. For example, a sensor node metadata
could describe when and where the sensor node was deployed, who deployed that
node, which environmental attributes are being measured, what are the key semantic
features or characteristics of that particular sensory system, and finally the valid
range of measurement that could be expected. However, metadata are generally
used to describe the principal aspect of data with the aim of sharing, reusing, and
understanding heterogeneous data sets. In fact, different types of sensor or sensor-
simulation model metadata may be considered, namely, static and dynamic sensor
metadata and associated sensing information. Based on natural language processing
and sensor-model ontologies, a cross-validation layer was created. Ideally, all similar
environmental variables from different data sources should be able to cross-validate
each other statistically, as representative similar variables for the same location for
the same time frame should be statistically very similar. Variables were semantically
matched according to their units, attributes they measure, context of the semantically
Search WWH ::




Custom Search