Database Reference
In-Depth Information
6. ADVANCED TECHNIQUES
representation. As a result, sequential probabilistic databases require changes to our theoretical and
practical infrastructure.
One popular statistical model to extract sequential structure data from raw, unstructured data
are HiddenMarkovModels (HMM) [ Rabiner , 1990 ]. Abstractly, an HMM takes as input a sequence
of observations and produces as output a probability distribution over a sequence of hidden states .
HMMs can be used to extract structure that is useful in a diverse set of applications: In RFID appli-
cations, the observations are the low-level antenna sightings, and the hidden states are sequences of
higher-level locations, such as rooms or hallways [ Letchner et al. , 2009 , Ré et al. , 2008 ]. In speech
applications, the observations are acoustic signals, and the hidden states are sequences of words
or phonemes [ CMUSphinx , 2010 , HTK , 2009 , Letchner et al. , 2009 , Rabiner , 1990 , Sha and Saul ,
2007 ]. There are other data-rich applications that use HMMs as well: sequence matching in biologi-
cal data [ Durbin et al. , 2006 , HMMER , 2010 ], optical character recognition [ Chen et al. , 1994 ], and
image classification [ Fosgate et al. , 1997 ]. To capture the output of statistical models like HMMs,
Kanagal and Deshpande [ 2009 ] and Ré et al. [ 2008 ] defined Markov Sequence Databases . In this ap-
proach, a database contains several Markov Sequences , which are sequences of random variables that
have the Markov property, i.e., a random variable is independent of its history given its predecessor.
Markov sequences exactly capture the output of an HMM after the HMM “sees” the observations.
Formally, a Markov sequence μ of length n operates over a finite set of state nodes (or
just nodes ), 2
and it comprises an initial-state distribution μ 0
and transition functions μ i
for all
1 i<n , where μ 0
and μ i
are defined as follows.
￿ μ 0 : →[
0 , 1
]
is a function such that for all s ,
μ 0 (s)
=
1 .
s
￿ μ i : × →[
0 , 1
]
is a function such that for all s ,
μ i (s, t)
=
1 .
t
The set of the state nodes of the Markov sequence μ is denoted by μ . We may write μ [ n ]
instead
of μ to denote that μ is a Markov sequence of length n .
Example 6.5 Our example of RFID for Markovian sequences is adapted from [ Kimelfeld and Ré ,
2010 ]. In particular, we consider a hospital where transmitters are attached to medical equipment.
Each transmitter transmits discrete signals with fixed time intervals in between. Sensors are spread
over the hospital in known locations (e.g., rooms, hallways, laboratories, etc.). A transmission contains
the transmitter identifier and a sensor identifier .
2 Note that denotes the nodes of a Markov sequence, whereas for an NFA it denotes the alphabet. This is not accidental since
we later use the state nodes as the alphabet of an NFA.
Search WWH ::




Custom Search