Human–computer interaction - Pervasive Systems and Ubiquitous Computing

Information Technology Reference

In-Depth Information

camera both placed on the head and provides an image of what the user is

looking at.

The first method is the most accurate but cannot be used in practical

HCI. In the same way, head-mounted equipments are not a practical choice.

In general, remote camera eye tracking is performed despite of its low

accuracy.

The most common method to devise if the user is attentive is to define a

distance/time threshold: when two look-at points are close more than a

threshold for a sufficient amount of time, a fixation is detected.

Salvucci and Anderson [12] developed a more sophisticated technique

that classifies the eye movements using a Hidden Markov Model (HMM).

At first, a two-state HMM is used to separate fixations from saccades.

These are very noisy data, so a second HMM is used that takes into account

the closeness of each fixation to the screen objects and the context made by

the other objects the user has just fixated. The model is then compared with

several plausible sequences, and the most likely one is selected (best

overall fit). Fixations carry information about their position and duration.

Position indicates the objects the user has probably dealt with. Duration

indicates the objects the user has most likely involved in detailed

computations [10].

3.1.4 Voice perception

Voice perception implies redundancy removal from the sound wave, and an

effective representation of the main speech features to simplify successive

computations. One of the main applications in the field of speech

processing is digital encoding of voice signal for efficient storing and

transmission.

Vocal communication between humans and computer consists of two

phases:

•

text-to-speech (TTS) and

•

automatic speech recognition (ASR).

Obviously TTS is simpler than ASR due to the asymmetries in producing

and recognizing speech.

Two main processes are crucial for both ASR and TTS systems:

•

segmentation and

•

adaptation.

Segmentation has to be faced both by TTS and ASR. In the case of ASR,

segmentation can be helped by particular speech styles. Fluent speech

recognition allows the user to have a natural dialogue with the system, but it

is a very hard task.

Pervasive Systems and Ubiquitous Computing

Search WWH ::

Custom Search

Home