Information Technology Reference
In-Depth Information
ysed teacher gestures to predict and detect slide
transitions.
The problem of detecting slide transitions can
be avoided by directly capturing the VGA output
(as e.g. in Rowe & Casalaina, 2006). This has the
advantage of allowing capturing not only slides,
but also just anything that happens on the present-
ers screen (videos, demonstrations, live tutorials
etc.). On the other hand, it has the disadvantage
of losing precious semantic information: timing
of the slide transitions (and slide titles) offer an
important navigation feature, and the content of
the slides can be parsed to allow at least a limited
possibility of searching the video through textual
queries. Tanaka et al. (2004) detect slide change
through an http proxy - but this approach only
works for HTML-based presentations. Another
possibility is having the speakers laptop signalling
PowerPoint slide transitions (as e.g. in Baecker,
2003), but this requires the speakers using Win-
dows and accepting to set up a macro on their
machines.
A pioneering example of lecture room automa-
tion was Bellcore's AutoAuditorium (Bianchi,
1998). Other systems are reported e.g. by Liu et
al. (2001) and Wallik et al. (2004). Wang et al.
(2007) used gesture and posture recognition to
simulate suitable camera motion and to perform
video cutting effects.
An alternative approach is to use just one
camera with panoramic video capturing, and then
to extract the portion of image of interest (Sun,
2005). A rather similar idea is implemented in the
EYA system (Canessa et al. 2008). They used a
wide-angle photo camera to record high resolution
pictures every 10 seconds. The client is a little
different from the usual video+slide format: it
shows the video, a large thumbnail of the current
picture, and when the user moves the mouse over
the thumbnail, a high resolution subset of the im-
age is shown where other systems put the slide.
In this way, the user can focus on the detail he
wants (be it the blackboard, the projected screen
or other). The advantage is that fully traditional
lectures (based on chalkboard or even on view-
graphs can be fully supported, even though also
here the semantics carried by the slide change
detection is lost.
Virtual Cameraman
Capturing visual details - such as following the
teacher to better capture his/her expressions and
body language, or zooming on the blackboard
when needed is one of the advantages offered by
a (costly) human operator. Some researchers have
attempted to implement a “virtual cameraman”.
Such a virtual actor should be able to follow
the speaker as s/he moves, and also to follow a
“savvy” style, as performed by an art director,
e.g. changing camera, focus on the semantically
important details etc. Tracking a human object has
been the focus of a research field that is too wide
to review it here. Typical techniques are sensor-
based tracking (in which the target is required to
wear a suitable, e.g. magnetic, device), sound-
based tracking (based on stereo perception or
commercially available microphone array-based
sound localization), video-based tracking (such as
skin-colour based, motion-based, shape-based).
Capturing the Blackboard
Since in many lectures the blackboard still plays
an important role, it is important to be able to
effectively capture what happens there. We have
already mentioned various ways to do so: using
high-resolution pictures or having a virtual or
real cameraman. Recent years have seen in some
countries a relatively wide diffusion of interactive
whiteboards: touch sensitive devices connected
with a computer and a projector that allow captur-
ing the screen and manually interacting with the
objects present on the screen (menus, windows,
icons etc.). Also in this case, the research field is
too wide to be discussed here, but we shall briefly
mention a few cases in which such or similar de-
Search WWH ::




Custom Search