Data-Driven Stream Mining Systems for Computer Vision - Advances in Embedded Computer Vision

Graphics Reference

In-Depth Information

address the challenges in providing robust, efficient, and integrated stream mining

solutions for next-generation embedded computer vision systems.

The methods discussed in this section were originally presented in [ 2 ]. In this

section, we provide a concise summary of the developments in [ 2 ] in the context of

ASM systems for embedded computer vision. For full details on these methods, we

refer the reader to [ 2 ].

12.5.1 Overview

In most video-based object or face recognition services on mobile devices, each

device captures and transmits video frames over a wireless channel to a remote

computing service (a.k.a. the “cloud”) that performs the heavy-duty video feature

extraction and recognition tasks for a large number of mobile devices. The major

challenges of such scenarios stem from the highly-varying contention levels in the

wireless local area network (WLAN), as well as the variation in the task-scheduling

congestion in the cloud.

In order for each device to maximize its object or face recognition rate under

such contention and congestion variability, a systematic learning framework based

on multi-armed bandits has been developed [ 2 ]. Unlike well-known reinforce-

ment learning techniques that exhibit very slow convergence rates when operating

in highly-dynamic environments, this bandit-based, systematic learning approach

quickly approaches the optimal transmission and processing-complexity policies

based on feedback on the experienced dynamics (contention and congestion levels).

The case study presented in this section centers on this bandit-based, systematic

learning approach.

Many of the envisaged applications and services for wearable sensors, smart-

phones, tablets or portable computers in the next ten years will involve analysis of

video streams for event, action, object or user recognition [ 21 , 32 ]. In this process,

they experience time-varying channel conditions, traffic loads and processing con-

straints at the remote cloud-computing servers where the data analysis takes place.

Examples of early commercial services in this domain include Google Goggles,

Google Glass, Facebook automatic face tagging [ 7 ], and Microsoft's Photo Gallery

face recognition.

12.5.2 Application Example

Figure 12.4 presents an example of such deployments. Video content producers

include several types of sensors, mobile phones, as well as other low-end portable

devices, that capture, encode (typically via a hardware-supported MPEG/ITU-T

codec) and transmit video streams to a remote computing server for recognition

or authentication purposes. A group of M devices in the same WLAN comprises a

Search WWH ::

Custom Search

Home