Data-Driven Stream Mining Systems for Computer Vision - Advances in Embedded Computer Vision

Graphics Reference

In-Depth Information

cloud congestion levels are both independent of the actions taken by the devices

within their clusters. This makes each device independent, since the decisions made

by other devices do not affect the reward.

12.5.3 Relation to Prior Work

Each mobile device of Fig. 12.4 seeks to maximize its own expected recognition

rate at the minimum possible cost in terms of utilized wireless resources (i.e., MAC

superframe transmission opportunities used). To this end, several approaches have

been proposed that are based on reinforcement learning [ 36 ], such as Q-learning [ 30 ].

In these, the goal is to learn the state-value function, which provides a measure of the

expected long-term performance (utility). However, they incur large memory over-

heads for storing the state-value function, and they are slow to adapt to new or dynam-

ically changing environments. A better approach is to intermittently explore and

exploit when needed, in order to capture such changes. Index policies for multi-armed

bandit (MAB) problems, contextual bandits [ 22 , 33 ], or epsilon-decreasing algo-

rithms [ 3 ] can be used for this task. However, all existing bandit frameworks do not

take into consideration the contention and congestion conditions as contexts in the

application under consideration.

12.5.4 Learning Based on Multi-user Bandits

Motivated by the lack of efficient methods that fully capture the problems related to

online learning in multi-user wireless networks and cloud computing systems with

uncertain and highly-varying resource provisioning, an online systematic learning

theory based on multi-user contextual bandits has been developed. This learning

theory can be viewed as a natural extension of the basic MAB framework. Analytic

estimates have been derived to compare its efficiency against the complete knowledge

(or “oracle”) benchmark in which the expected reward of every choice is known by

the learner. Unlike Q-learning [ 36 ] and other learning-based methods, it is proven

that the regret bound—the loss incurred by the algorithm against the best possible

decision that assumes full knowledge of contention and congestion conditions—is

logarithmic if users do not collaborate and each would like to maximize the user's

own utility. Finally, the contextual bandit framework discussed here is general, and

can be used for learning in various kinds of wireless embedded computer vision

applications that involve offloading of selected processing tasks. Henceforth in this

chapter, we refer to the contextual bandit framework by the abbreviation CBF .

Search WWH ::

Custom Search

Home