Digital Signal Processing Reference
In-Depth Information
Figure 3-10. Ratio of vocabulary size of proper nouns.
5.1
Statistics of Speakers Utterances
The number of speakers in Phase II was 201. Six of the speakers
misunderstood the instructions, and the corresponding dialogs were removed
from the corpus. We analyzed the remaining 195 dialogs, which are referred
to as the “spoken dialog corpus of Phase II” in the following discussion. The
corpus consisted of 390 conversations which included 28,334
operator
utterances and 27,509 speaker utterances.
The spoken dialog corpus was segmented into morphemes using
ChaSen[4], a Japanese morphological analyzer. The vocabulary size for
speakers' utterances was 4,533 words, consisting of 762 proper nouns and
3,771 words other than proper nouns. The set of proper nouns varied
according to the sightseeing area selected. The remaining words were more
general to the overall set of dialog tasks. From these observations, we
concluded that the lexicon for ASR would be approximately 4,000 to 5,000
words to support recognition of speaker utterances for this general family of
tasks.
Figure 3-10 provides a categorization of annotated proper nouns. Four
types of proper nouns (places, tourist attractions, shops and traffic facilities)
comprised about 98% of the proper nouns attested in the corpus.
Search WWH ::




Custom Search