Information Technology Reference
In-Depth Information
predicting these “part probabilities” from a single depth image, they were able
to find accurate predictions of the three-dimensional locations of the differ-
ent joints in the body. The Xbox team was then able to take these predictions
and stitch them into a coherent three-dimensional “skeletal” representation
of the body.
Decision trees work like the guessing game Twenty Questions ( Fig. 14.7 ),
where each question reduces the number of possible answers. For every pixel in
the image, the computer asks a series of questions, such as “Is that point on the
right of the image more than twelve centimeters farther away than the point
under this pixel?” Based on the answer to questions like these, the program pro-
ceeds farther down the tree, asking additional questions until it can assign the
pixel to a specific body part. The challenge was how best to determine the ques-
tions in the tree; the decision trees used in the final system had a depth of around
twenty levels and contained nearly a million nodes. The answer was to train the
system on a very large set of examples. With the Xbox team, the researchers
recorded hours of video footage of actors at a motion capture studio. They filmed
the actors performing actions that would be useful for gaming, such as dancing,
running, fighting, driving, and so on. These data were then used to automatically
animate computer graphic models of different human shapes and sizes. They
then simulated the readings that the Kinect sensor would get in a simulation of
these actions. The resulting training set contained millions of synthetically gen-
erated depth images and the simulated true body positions ( Fig. 14.8 ).
The final challenge was computational. Shotton's previous work on object
recognition in photographs had used training sets of only a few hundred images,
and the training phase took less than a day on a single machine ( Fig. 14.9 ). With
millions of training images, the Microsoft researchers had to work out how
to distribute the training on a cluster of one hundred or so computers. This
distributed processing enabled them to keep the training time down to less
than a day. With these advances, the researchers and the Xbox team developed
very powerful skeletal tracking software and used it to create a whole variety
of “controller-free” games. Microsoft launched Xbox Kinect in November 2010
with the marketing slogan “You Are the Controller.” Kinect rapidly became
the fastest-selling consumer electronics device in history, according to Guinness
World Records .
Fig. 14.7. The 20q game from Radica
uses AI technologies to guess the item
you are thinking of in twenty questions
or less. The game was runner-up for
“Game of the Year” in 2005.
Fig. 14.8. Illustration of three-dimen-
sional image recognition by body parts.
The system learns to convert the raw
depth images on the left into body part
images, and then convert them to a
three-dimensional stick version of the
body joints.
Depth image
Body parts
3D joint proposals
Search WWH ::




Custom Search