Challenges in Embedded Vision for Augmented Reality - Advances in Embedded Computer Vision

Graphics Reference

In-Depth Information

needs to run at real-time frame rates and with the “always on-always Augmented”

use case, the power usage of the mobile device becomes a major challenge since the

battery may drain out within approximately one hour. Next we look into some of the

AR technologies that are commonly used.

Marker-based Tracking : In order to obtain the camera pose in real-time, marker-

based techniques are used. One example of marker-based tracking is the IKEA app

fromMetaio that uses tracking andmonocular simultaneous localization andmapping

(SLAM) algorithms. The marker is used to obtain the scale of the room. Markers are

easily detected in the image due to their unique color and pattern. The high contrast

combination of black-and-white square block pattern used along with four known

marker points provides accurate calculation of camera pose. The issue is that the

marker should always be visible in the camera frame of view and is susceptible to

illumination variation.

Marker-less Tracking : The typical “marker-less” pipeline takes a video frame,

extracts features like corners, describes them in a descriptor vector, and matches

them against a database of reference object descriptors, which have been previously

recorded. After the objects are detected they are tracked frame by frame. The key for

a robust, accurate and fast 3D feature tracking pipeline is to find the right balance

between number of features, pyramid scaling, and recording the 'right' information

in the descriptors. This task requires a lot of experience, real-life expertise, and

validation. Thus, not many really good 3D feature trackers are available in themarket.

The amount of detected feature points depends on the size and complexity of the

object or environment to be detected and tracked. Typically, for a single 3D object,

the algorithm has to deal with 1,000-2,000 feature points, for small rooms about

5,000 features, and for outdoor scenarios 10,000-20,000 features. These reference

features have to be matched with all the new detected features estimated every at

30 fps, resulting in more than 200 GOPS for the detection or initialization phase,

whereas the tracking phase is less demanding. SLAM on the other hand tries to

localize the camera in the mapped environment and then estimates the camera pose

relative to themapped environment. The better we canmap the environment, themore

precise is the camera pose and vice versa. The common feature detectors can extract

corners, blobs, patches, and edges and only a few feature detectors such as FAST [ 16 ]

are suitable for embedded real-time processing. In some scenarios, dense tracking

is needed to compute structure from motion and Lucas Kanade feature tracking is

widely used. Feature matching is performed from frame to frame or key to frame to

frame using template-based or feature descriptors. These algorithms require a large

amount of computation and memory bandwidth which has a direct impact on the

power requirement.

Edge-based Tracking : Currently, tracking and mapping approaches based on

distinctive feature points are the most common algorithms employed for AR pur-

poses. Usually, 2D points in images are selected and represented using the standard

computer vision detector-descriptor scheme. Positive features of point-based track-

ing approaches are the following:

•

high level of invariance to rotation and translation;

Advances in Embedded Computer Vision

Search WWH ::

Custom Search

Home