Graphics Reference
In-Depth Information
Similar to Thórisson (2002a), pitch is further analyzed by a
Prosody Analyzer perception module to compute a more compact
representation of the pitch pattern in a discrete state space, in our
case to support the learning: The most recent tail of speech right
before a silence, the last 300 msecs, is analyzed to detect minimum
and maximum values of the fundamental pitch to produce a tail-slope
pattern of the pitch. Slope is split into semantic categories; in the
present implementation we have used three categories for slope: Up,
Straight and Down according to Formula 1 and three for the relative
value of pitch right before silence: Above, At and Below , as compared
to the average pitch according to Formula 2.
if
m
>
0
05
slope
=
Up
ì
pitch
í
(
0
05
m
0
05
)
slope
Straight
m
=
if
=
msecs
î
if
m
<
0
05
slope
=
Down
if
d
Pt
end
Above
ì
>
=
í
d
=
pitch
pitch
if
(
Pt
d
Pt
)
end
=
At
end
avg
î
if
d
Pt
end
Below
<
=
where Pt is the average ± 10, i.e. pitch average with a bit of tolerance
for deviation.
The primary output of the Prosody Analyzer is a symbolic
representation of the particular prosody pattern identifi ed in this tail
period (see Figure 3). More features could be added into the symbolic
representation, with the obvious side effect of increasing the state space.
The Speech-To-Text module and Text Analyzers deal with speech
recognition. Speech recognition is done incrementally with the best
Figure 3. A window of 9 seconds of spontaneous speech, which includes speech periods
and silences, categorized into descriptive groups for slope and end position relative to
the average pitch. Only slope of the fundamental pitch during the immediate 300 msecs
preceding a silence (indicated by the gray area) is categorized (into Up, Straight, and
Down). (Abscissa: Voice F0 in Hz, as produced in near real-time by Prosodica; mantissa:
Time-Hours/minutes/seconds.)
Search WWH ::




Custom Search