Information Technology Reference
In-Depth Information
participated in the experiment. The participants were either Caucasian Americans (forty-eight par-
ticipants) or first-generation Koreans (forty-eight participants). Participants were directed to an
e-commerce Web site, where they listened to descriptions of four different products: a backpack,
a bicycle, an inflatable couch, and a desk lamp.
Half of the Korean participants and half of the Caucasian American participants heard product
descriptions read by a voice with a Korean accent that occasionally used distinctly Korean phrases
(e.g., “ Anyonghaseyo ,” which is Korean for “hello”). The other half of the participants heard
descriptions read by a voice with an Australian accent that occasionally used phrases associated
with Australians (e.g., “G'day, mate”). 1 For a given participant, each description was read by the
same voice and was accompanied by a full-length photograph of the same product spokesperson.
The spokesperson had a different pose when describing each of the four items to give a sense of
liveliness.
To hear the description of an item, participants clicked on the speaker's photograph. Half of the
participants who heard the Korean-accented voice were shown a photograph of a racially Korean
male; the other half were shown a photograph of a racially Caucasian Australian male to accompany
the voice. Similarly, half of the participants who heard the Australian-accented voice were shown
a photograph of a racially Caucasian Australian male and half were shown a photograph of a Korean
male. Thus, the accent and race were mixed. Half of the participants in each condition were cultur-
ally and racially Korean; the other half were culturally and racially Caucasian Americans.
After hearing each product description, participants were asked to respond to a questionnaire
that asked about the product's likability and the description's credibility. After listening to the
descriptions of all four products, participants were also asked to rate the agent's overall quality.
Although there was no logical linkage between the para-linguistic cues of the voice and the race
of the agent, participants were clearly disturbed when the agent did not “look the way it sounded.”
The photographic agents that had “consistent” voices and faces were perceived to be much better
than those that were inconsistent, regardless of the ethnicity of the user. Participants also found
the products to be better and the product descriptions to be more credible when the two “places of
origin” were consistent.
Designing Ethnicity in Interfaces
Toward the end of 2004, we made an attempt to listen to as many different voice interfaces in the
United States as we could. We called airline and train reservation systems, technical help centers,
in-car navigation systems, etc. While the voices and content certainly reflected different person-
alities and included both genders, there was a striking similarity: Every single interface sounded
like it was spoken by a Caucasian from the upper Midwest, the accent that is considered to be
“neutral” in the United States (MacNeil and Cran, 2004). This was remarkable, given that in the
next few years, “whites” will be a minority of the country, and the upper Midwest is not one of the
most populous regions of the country.
We have already dismissed the argument that Caucasians (as distinct from ethnicities that are
associated with other accents) speak without an “accent”; everyone has an accent. The argument
that the “white” accent is standard and hence understandable by the whole population was used in
the early years of television to exclude minorities; obviously, this argument cannot hold sway. It
is undoubtedly alienating for a large fraction of the population never to encounter someone who
sounds like himself or herself when they use a voice interface.
The problem is made even more apparent in voice interfaces than in traditional media because
non-Caucasian accents tend to be less understood by voice recognition systems. This provides an
Search WWH ::




Custom Search