Survey of Pen-and-Paper Computing - Human-Computer Interaction

Information Technology Reference

In-Depth Information

is already available in several mobile phones. It is likely that RFID will gain more

prominence in the near future and will supersede visual markers in some domains.

A range of electromagnetic solutions (e.g. Ubisense 7 ) provide for capturing the

physical location of objects. To date, however, they require large markers that cannot

be used when working with thin sheets of paper.

Some research projects have developed individual solutions for identifying ob-

jects. Similarly to RFID, integrated circuits storing a unique identifier are applied

to physical objects. The communication to a reading device is made via a wired

connection (e.g. [130, 49]) or a wireless connection (e.g. [114]).

Content-based Identification and Tracking

Content-based approaches do not require markers. Instead, documents are identified

and tracked solely by their visual appearance. A camera is observing the scene or

documents are scanned, and image processing techniques are used for analyzing the

images.

The most commonly used approach is SIFT features [83]. In order to identify

a document page, it first has to be registered. The system captures specific optical

features of the image and stores these features as a fingerprint of the document page.

When a document page appears in the camera image, its features are captured and

compared against the fingerprints in the database to identify the page. This technique

performs quite well if the camera provides a good resolution and the number of

different pages is not very high. For instance, Kim et al. [60] reached a recognition

rate of more than 90 % if the document width in the camera image was at least

300 pixels. Liu et al. [79] present an improved set of features that was inspired by

the SIFT feature set. They achieved recognition rates of more than 99 %, even if

pages had to be identified from a very large set of pages. However, their evaluation

excluded lighting effects and camera noise, so it must be assumed that in real setups

performance will be lower.

Another approach uses optical character recognition of document text [24]. The

recognized text is used for querying a database of documents and retrieving the

digital version. As an alternative to using text recognition, the white spaces between

words have proven to be quite unique for each text. Brick Wall Coding [26] detects

the white gaps between words and takes them as features. It has lower recognition

performance than SIFT or FIT features, but it can detect a document page even when

the page is only partially visible to the camera.

A drawback of content-based approaches is that the visual appearance must con-

tain enough information for unambiguous identification. For instance, it is not pos-

sible to uniquely identify one of several copies of the same document page. More-

over, content-based identification is challenging if the user modifies documents, for

instance by making annotations.

7

http://www.ubisense.net

Search WWH ::

Custom Search

Home