Distributed Smart Cameras in the Age of Cloud Computing and the Internet-of-Things - Advances in Embedded Computer Vision

Graphics Reference

In-Depth Information

11.5 Peer-to-Peer Distributed Algorithms

Client-server architectures do not scale for a wide range of vision problems. The

community has now developed a range of distributed algorithms for important vision

tasks, but we still have much to learn.

Long-term tracking using cameras with nonoverlapping views has received a great

deal of attention. Kim and Wolf [ 13 ] developed a distributed algorithm for Markov

chain Monte Carlo tracking. Cameras estimate local paths based on local communi-

cations; the local paths are then transmitted through the network and concatenated

to build longer tracks of the target. Esterle et al. [ 7 ] use autonomous self-interested

agents to learn the vision graph during operation. The algorithm does not require

multi-camera calibration since it does not rely on a priori camera topology informa-

tion. Cameras bid for the right to track an object; the utility of a tracked object to a

camera depends on the visibility of the target to the camera and its confidence in its

tracking estimate. Sales of a target from one camera to another are used to build the

structure of the vision graph. As the systemmakes more observations, the communi-

cations between cameras can become more targeted based on their understanding of

the vision graph structure. Wan and Li [ 27 ] formulated an online algorithm for asso-

ciating observations with targets; at each new observation, nodes trade information

with neighbors on observations and networks, then update their inferences.

Sek Chai [ 26 ] describes a distributed smart camera system based on smartphone

processors called visual sensor networks (VSNs). Nodes distribute metadata for

search purposes while video data remains on the local nodes. Metadata is organized

into a video catalog accessible using key search indices including time, location,

and object description. Nodes decide when to activate their cameras in order to

manage their energy consumption; a node may be turned on at a predefined time or

by an external event such as sound or vibration detection. Their implementation on a

Qualcomm Snapdragon MSM8960 used 1.6W for capture, processing, and storage

of compressed video, with motion tracking and analytics generation requiring an

additional 0.4W.

An early example of in-network search was provided by Yan et al. [ 32 ], who

developed a distributed image search system for a sensor network based on iMote2

sensor nodes. Their system uses SIFT to generate feature vectors that are clustered

into visterms (representation of an image feature). They optimized both their vocab-

ulary tree and inverted index for flash memory. They used buffered lookup to reduce

the performance penalty of storing the vocabulary tree in flash, which has much

longer access times than RAM. The read the tree in subtree increments that fit into

ram, then buffer a collection of SIFT vectors for lookup in the subtree. Optimizations

for the inverted index were designed to compensate for the write characteristics of

flash: writes are slower than reads, and an entire block must be erased and rewritten

to change any value in the block [ 19 ]. They stored the inverted index in a log-

structure, which uses an appended log to record changes to the file, which requires

multiple file reads for a data access. They minimize read overhead by writing only

visterms with many associated image IDs to flash; Zipf's Law predicts that a small

Advances in Embedded Computer Vision

Search WWH ::

Custom Search

Home