Information Technology Reference
In-Depth Information
camera both placed on the head and provides an image of what the user is
looking at.
The first method is the most accurate but cannot be used in practical
HCI. In the same way, head-mounted equipments are not a practical choice.
In general, remote camera eye tracking is performed despite of its low
accuracy.
The most common method to devise if the user is attentive is to define a
distance/time threshold: when two look-at points are close more than a
threshold for a sufficient amount of time, a fixation is detected.
Salvucci and Anderson [12] developed a more sophisticated technique
that classifies the eye movements using a Hidden Markov Model (HMM).
At first, a two-state HMM is used to separate fixations from saccades.
These are very noisy data, so a second HMM is used that takes into account
the closeness of each fixation to the screen objects and the context made by
the other objects the user has just fixated. The model is then compared with
several plausible sequences, and the most likely one is selected (best
overall fit). Fixations carry information about their position and duration.
Position indicates the objects the user has probably dealt with. Duration
indicates the objects the user has most likely involved in detailed
computations [10].
3.1.4 Voice perception
Voice perception implies redundancy removal from the sound wave, and an
effective representation of the main speech features to simplify successive
computations. One of the main applications in the field of speech
processing is digital encoding of voice signal for efficient storing and
transmission.
Vocal communication between humans and computer consists of two
phases:
text-to-speech (TTS) and
automatic speech recognition (ASR).
Obviously TTS is simpler than ASR due to the asymmetries in producing
and recognizing speech.
Two main processes are crucial for both ASR and TTS systems:
segmentation and
adaptation.
Segmentation has to be faced both by TTS and ASR. In the case of ASR,
segmentation can be helped by particular speech styles. Fluent speech
recognition allows the user to have a natural dialogue with the system, but it
is a very hard task.
Search WWH ::




Custom Search