Graphics Reference
In-Depth Information
address the challenges in providing robust, efficient, and integrated stream mining
solutions for next-generation embedded computer vision systems.
The methods discussed in this section were originally presented in [ 2 ]. In this
section, we provide a concise summary of the developments in [ 2 ] in the context of
ASM systems for embedded computer vision. For full details on these methods, we
refer the reader to [ 2 ].
12.5.1 Overview
In most video-based object or face recognition services on mobile devices, each
device captures and transmits video frames over a wireless channel to a remote
computing service (a.k.a. the “cloud”) that performs the heavy-duty video feature
extraction and recognition tasks for a large number of mobile devices. The major
challenges of such scenarios stem from the highly-varying contention levels in the
wireless local area network (WLAN), as well as the variation in the task-scheduling
congestion in the cloud.
In order for each device to maximize its object or face recognition rate under
such contention and congestion variability, a systematic learning framework based
on multi-armed bandits has been developed [ 2 ]. Unlike well-known reinforce-
ment learning techniques that exhibit very slow convergence rates when operating
in highly-dynamic environments, this bandit-based, systematic learning approach
quickly approaches the optimal transmission and processing-complexity policies
based on feedback on the experienced dynamics (contention and congestion levels).
The case study presented in this section centers on this bandit-based, systematic
learning approach.
Many of the envisaged applications and services for wearable sensors, smart-
phones, tablets or portable computers in the next ten years will involve analysis of
video streams for event, action, object or user recognition [ 21 , 32 ]. In this process,
they experience time-varying channel conditions, traffic loads and processing con-
straints at the remote cloud-computing servers where the data analysis takes place.
Examples of early commercial services in this domain include Google Goggles,
Google Glass, Facebook automatic face tagging [ 7 ], and Microsoft's Photo Gallery
face recognition.
12.5.2 Application Example
Figure 12.4 presents an example of such deployments. Video content producers
include several types of sensors, mobile phones, as well as other low-end portable
devices, that capture, encode (typically via a hardware-supported MPEG/ITU-T
codec) and transmit video streams to a remote computing server for recognition
or authentication purposes. A group of M devices in the same WLAN comprises a
Search WWH ::




Custom Search