Machine learning and natural language processing - The Computing Universe: A Journey through a Revolution

Information Technology Reference

In-Depth Information

predicting these “part probabilities” from a single depth image, they were able

to find accurate predictions of the three-dimensional locations of the differ-

ent joints in the body. The Xbox team was then able to take these predictions

and stitch them into a coherent three-dimensional “skeletal” representation

of the body.

Decision trees work like the guessing game Twenty Questions ( Fig. 14.7 ),

where each question reduces the number of possible answers. For every pixel in

the image, the computer asks a series of questions, such as “Is that point on the

right of the image more than twelve centimeters farther away than the point

under this pixel?” Based on the answer to questions like these, the program pro-

ceeds farther down the tree, asking additional questions until it can assign the

pixel to a specific body part. The challenge was how best to determine the ques-

tions in the tree; the decision trees used in the final system had a depth of around

twenty levels and contained nearly a million nodes. The answer was to train the

system on a very large set of examples. With the Xbox team, the researchers

recorded hours of video footage of actors at a motion capture studio. They filmed

the actors performing actions that would be useful for gaming, such as dancing,

running, fighting, driving, and so on. These data were then used to automatically

animate computer graphic models of different human shapes and sizes. They

then simulated the readings that the Kinect sensor would get in a simulation of

these actions. The resulting training set contained millions of synthetically gen-

erated depth images and the simulated true body positions ( Fig. 14.8 ).

The final challenge was computational. Shotton's previous work on object

recognition in photographs had used training sets of only a few hundred images,

and the training phase took less than a day on a single machine ( Fig. 14.9 ). With

millions of training images, the Microsoft researchers had to work out how

to distribute the training on a cluster of one hundred or so computers. This

distributed processing enabled them to keep the training time down to less

than a day. With these advances, the researchers and the Xbox team developed

very powerful skeletal tracking software and used it to create a whole variety

of “controller-free” games. Microsoft launched Xbox Kinect in November 2010

with the marketing slogan “You Are the Controller.” Kinect rapidly became

the fastest-selling consumer electronics device in history, according to Guinness

World Records .

Fig. 14.7. The 20q game from Radica

uses AI technologies to guess the item

you are thinking of in twenty questions

or less. The game was runner-up for

“Game of the Year” in 2005.

Fig. 14.8. Illustration of three-dimen-

sional image recognition by body parts.

The system learns to convert the raw

depth images on the left into body part

images, and then convert them to a

three-dimensional stick version of the

body joints.

Depth image

Body parts

3D joint proposals

The Computing Universe: A Journey through a Revolution

Search WWH ::

Custom Search

Home