Information Technology Reference
In-Depth Information
interact with the speaker, sometimes though the
intermediation of a human proxy (e.g Baecker et
al., 2003). Also on this front, which is very active
in the research on computer supported coopera-
tive work, not much has been harvested in the
e-learning scenario.
Using an ASR to extract text from a video-
lecture in not trivial, as a good quality of the sound
is not always guaranteed. Moreover lecture lan-
guage resembles conversational language (Glass
et al., 2007), and often contains domain-specific,
rare words, or even words in a different language.
Being able to supplement the ASR with a suitable
language model is therefore important. Choudhari
et al. (2007) propose starting from textbooks
to identify the most relevant terms that should
compose the vocabulary.
In literature there are examples of segment-
ing lectures based on ASR text (Lin et al. 2003,
Fujii et al 2006) or on text and slides (Repp et al.,
2008a). Repp et al. (2008b) also built a Q/A system
based on the semantic annotation generated for
the lectures. Hürst et al. (2006) and Fogarolli et al.
(2007) used a multimodal approach, indexing both
ASR-extracted text and slides to allow searching
libraries of lectures. Fogarolli et al. (2009) also
annotated the lectures by using Wikipedia so as
to automatically extract from the ASR text the
most important topics that each lecture dealt with.
Supporting efficient and user-friendly naviga-
tion with interactive content overviews may need
more research on the interface to be used. Mertens
et al (2006) explored this facet. Other topics in-
clude multimedia shrinking and summarization,
video abstraction, lecture segmentation, gesture
analysis. A review of these approaches applied
to the videolecture domain is reported elsewhere
(Ronchetti, 2010).
Semantic Indexing, Multimodal
Access, and Other Topics
Indexing and allowing search in a distributed
collection of video-lecture is a key element in an
unstructured scenario like the one of self-study
to support life-long learning. WLAP is a notable
project (Bousdira et al., 2001) that used the no-
tion of “learning object” (an IEEE standard 14 ) to
represent lectures, and that envisioned a distrib-
uted architecture where multiple archives could
be integrated. Indexing terms is however not
enough: what would be really needed is semantic
modelling of the video-lectures, i.e. the possibility
of understanding which topics are dealt with. A
good semantic model would be helpful in aiding
search and retrieval of the content - also in the form
of query/answer (Q/A) - summarising lectures,
splitting the lectures into smaller, homogeneous
chunks, indexing them. Such tasks can get advan-
tage from the information that can be extracted
from the slides, notes and other associated material,
and from transcripts of the speech. The advantages
would be to be able to find the most interesting
lectures for a given topic in a large collection, and
to be able to quickly identify the most interesting
portions of each candidate lecture.
A natural approach is to try to apply text tech-
niques (such as content extraction, summariza-
tion etc.) from multimedia passing through the
intermediate step of applying Automated Speech
Recognition (ASR) techniques to the audio tracks.
Wald (2005) suggested that text from ASR could
be used to create captions (e.g. for deaf learners),
to assist those who, for cognitive, physical or
sensory reasons, find note taking difficult and to
allow search.
CONCLUSION
There is overwhelming evidence that video-
lectures effectively support learning. Literature
demonstrates that in most cases this is achieved in
a better way by recording live lectures rather than
creating ad-hoc , synthetic content. Also, it is now
clear that video-lectures should not be considered
as a mere replacement of classroom lectures. Be-
sides the obvious support of students who cannot
Search WWH ::




Custom Search