Advanced Techniques - Probabilistic Databases

Database Reference

In-Depth Information

6. ADVANCED TECHNIQUES

representation. As a result, sequential probabilistic databases require changes to our theoretical and

practical infrastructure.

One popular statistical model to extract sequential structure data from raw, unstructured data

are HiddenMarkovModels (HMM) [ Rabiner , 1990 ]. Abstractly, an HMM takes as input a sequence

of observations and produces as output a probability distribution over a sequence of hidden states .

HMMs can be used to extract structure that is useful in a diverse set of applications: In RFID appli-

cations, the observations are the low-level antenna sightings, and the hidden states are sequences of

higher-level locations, such as rooms or hallways [ Letchner et al. , 2009 , Ré et al. , 2008 ]. In speech

applications, the observations are acoustic signals, and the hidden states are sequences of words

or phonemes [ CMUSphinx , 2010 , HTK , 2009 , Letchner et al. , 2009 , Rabiner , 1990 , Sha and Saul ,

2007 ]. There are other data-rich applications that use HMMs as well: sequence matching in biologi-

cal data [ Durbin et al. , 2006 , HMMER , 2010 ], optical character recognition [ Chen et al. , 1994 ], and

image classification [ Fosgate et al. , 1997 ]. To capture the output of statistical models like HMMs,

Kanagal and Deshpande [ 2009 ] and Ré et al. [ 2008 ] defined Markov Sequence Databases . In this ap-

proach, a database contains several Markov Sequences , which are sequences of random variables that

have the Markov property, i.e., a random variable is independent of its history given its predecessor.

Markov sequences exactly capture the output of an HMM after the HMM “sees” the observations.

Formally, a Markov sequence μ of length n operates over a finite set of state nodes (or

just nodes ), 2

and it comprises an initial-state distribution μ 0 →

and transition functions μ i →

for all

1 ≤ i<n , where μ 0 →

and μ i →

are defined as follows.

μ 0 → : →[

0 , 1

]

is a function such that for all s ∈ ,

μ 0 → (s)

1 .

∈

μ i → : × →[

0 , 1

]

is a function such that for all s ∈ ,

μ i → (s, t)

1 .

t ∈

The set of the state nodes of the Markov sequence μ is denoted by μ . We may write μ [ n ]

instead

of μ to denote that μ is a Markov sequence of length n .

Example 6.5 Our example of RFID for Markovian sequences is adapted from [ Kimelfeld and Ré ,

2010 ]. In particular, we consider a hospital where transmitters are attached to medical equipment.

Each transmitter transmits discrete signals with fixed time intervals in between. Sensors are spread

over the hospital in known locations (e.g., rooms, hallways, laboratories, etc.). A transmission contains

the transmitter identifier and a sensor identifier .

2 Note that denotes the nodes of a Markov sequence, whereas for an NFA it denotes the alphabet. This is not accidental since

we later use the state nodes as the alphabet of an NFA.

Probabilistic Databases

Search WWH ::

Custom Search

Home