Digital Signal Processing Reference
In-Depth Information
Here and in the following the various language models are named
according to the domain (denoted by letter D) covered by data used in the
training phase. If a LM contains geographical classes, its name includes
information about the cluster (denoted by letter C) which contributes the lists
of names (cities, streets, hotels, etc.) used to expand the classes. Therefore,
for example, Dgl-Cgl denotes the LM trained on the global ( gl ) domain with
classes expanded with the global ( gl ) lists of names.
There are different options for building smaller LMs that contribute to
provide the complete coverage of the application domains foreseen in the
VICO system. A simple solution is to reduce the contents of the classes
associated to the large lists (cities, streets, hotels, etc.) introducing some
geographic clusters and building several LMs, each one covering only a
reduced area: in our setup Trentino has been divided in 7 geographic clusters
(C1,C2,C3,C4,C5,C7 ).
Another possible strategy in order to exploit different recognition units is
to build LMs not containing the classes associated to the big lists. This idea
derives from the observation than a generic dialogue contains a relatively low
number of sentences including the pronunciation of a noun associated to a
big list: this leads to the introduction of 2 further small LMs, namely Dge and
Dcmd, that have been trained removing from the corpus the sentences with
geographic class contents (e.g. cities, streets, hotels, POIs). In particular
Dcmd is a very restricted LMs (the vocabulary size is 130) and it should
handle only confirmation/refusal expressions and short commands to the
system.
Table 6-2 shows the results for these new LMs: Dgl denotes the original
global LM while the suffix Ci specifies the geographic cluster covered. The
higher WRRs obtained with Dgl-C1 are motivated by the fact that the WOZ
material regards geographic items mainly associated to C1 , i.e. the Trento city
area, where the acquisition took place. It is worth mentioning that although
WRR of Dge and Dcmd is rather low, the relative string recognition rate
shows that these LMs cover adequately a relevant part of the corpus.
Search WWH ::




Custom Search