Database Reference
In-Depth Information
extracted features attributes, and scientific meaningfulness, to form several variable
subgroups, that is, SILO rainfall (mm/day), SILO rainfall rate (mm/hr.), AWAP rainfall
(mm/day), CosmOz rainfall (mm/day), MODIS post real-time TRMM Multi-Satellite
Precipitation Analysis (TMPA) product (mm/day). Combination of this kind formed
a pool of similar variables, which should be able to cross-validated or complemented
each other in case of missing values from a particular time series within that pool.
The complementary method identified the missing value segments of a time series
and replaced those segments with an average segment based on available other time
series in the same pool. This was done to model missing data segment as a semantic
attribute. Sensor model Ontologies were used in this processing to use the correct
meaning of a time series to avoid any wrong complement. Next a “cross-correlation
technique” was used to measure the similarities between two complemented time
series signals representing similar scenarios (in terms of location and time period).
The other purpose of this layer was to cross-validate similar time series data in the
same pool to find a representative time series from that particular pool [10,28,29]. If
the two signals being compared were completely identical then the cross-correlation
coefficient should be equal to 1 and if there are significant similarities between the
signals it should be close to 0. A scoring protocol was designed on cross-correlation
results. The time series with highest score were selected from each subgroup as best
representative of the associated environmental variable for that time period. The
selected time series from all attribute pools were stored in an integrated structured
array where columns represented different variables whereas rows represented time
frames.
xt
()
xt
()
xt
()
11
12
1
m
xt
()
x
x
()
t
21
22
2
m
R
=
(15.1)
xt
()
ij
xt
()
xt
()
xt
()
n
1
n
2
nm
Integrated data was represented as a response matrix R where χ ij ( t ) represents daily
value of variable i on the date j , which is the j th location on the common time frame
(Equation 15.1).
15.3.4 F eature r ePresentation l ayer
An important issue with multidimensional Big Data sources is optimal feature
extraction to represent the knowledge within less dimensions. Data mining or unsu-
pervised machine learning techniques are widely being used for feature extraction in
physical, chemical, and environmental sciences [21,34,54,73]. Purpose of this layer
was to preprocess the time series matrix, extract sets of semantic features from this
matrix to create a reduced semantically enriched representation instead of the full
size input, so that the relevant and most significant meaningful information from
the input data would be captured to solve the multivariate problem. The general
multivariate problem in large-scale environmental sensing is commonly referred to
Search WWH ::




Custom Search