Graphics Reference
In-Depth Information
Camera Tracking from Video
In order to composite 3D content into a video like this one, it's necessary to create an accurate representation
of the camera's movement within Blender. When you render out the 3D content using Blender's virtual camera,
the placement and rotation of the 3D content in the frame must match the placement and rotation of where the
content should be in the live-action video.
Thiscanbecalculatedbytherelative2Dmovementof2Dpointsinthevideothatcorrespondtopointsinthe
real 3D space. Specifically, the phenomenon of parallax is used. For the same lateral camera movement, points
nearer to the camera will move farther across the screen than points more distant from the camera.
I specifically mention lateral camera movement, because certain common camera movements do not have
this characteristic of parallax. In particular, tripod pans, where the camera rotates at a fixed point, do not yield
parallax information. Imagine standing in a park full of trees. Your friend hides behind a tree where you cannot
see her. In order to catch a glimpse of your friend, you must move laterally. If you only rotate in place, you will
not see your friend. In fact, if you had friends hiding from you behind every tree in the park, you would never
find any of them by rotating like a camera on a tripod. Only lateral motion increases the spatial information
available to you.
Thesameistrueofcameratracking.Inordertore-createa3Denvironmentautomatically,thevideoprovided
must include parallax information, i.e., lateral movement. This isn't a major problem. If you have a shot that is
purely a tripod pan, you don't actually need to do camera tracking. Simply compositing 3D content as though
you were working with a still image will work fine. Blender can handle shots that have both lateral movement
and panning, but it does not get the parallax information necessary to re-create the space from the panning
movement.
The points used for tracking are represented in the Clip Editor as tracking markers. These are set by hand or
automatically to correspond with recognizable points on the video. When I say “recognizable,” I mean points
that Blender's computer vision pattern-recognition algorithm can identify as being the same feature from frame
to frame. Features are real-world things that the algorithm attempts to recognize. A bunch of black pixels sur-
roundedbyafieldofyellowpixelsinframe35islikelytobethesamefeatureasasimilarbunchofblackpixels
surrounded by yellow in frame 36, even if the specific pixels are different because the feature moved.
TherearebasicallytwowaystogoaboutcameratrackinginBlender.Youcanuseautomaticfeaturedetection
and then correct the many errors by hand, or you can do feature selection by hand, which should result in fewer
errors from the start but will proceed more slowly to build up a sufficiently large feature set. Both methods can
be painstaking, depending largely on the content of the video you're trying to track. I will describe using hand-
selected features.
Anatomy of a Good Feature
A good feature should be recognizable in 2D. That is, it should be composed of contrasting pixels in the image.
It must also represent a specific, unique 3D point in the real world. For example, an intersection between two
overhead cables might form a single point in a 2D image, but if the cables are separated in space, the intersec-
tion does not represent a unique 3D point. This would not be a good feature. The kind of intuition and common
sense a human being can use when selecting features to track is the advantage of selecting features by hand.
Figure 10-6 shows several more examples of good tracking points (on the left) that represent specific 3D points
and bad tracking points (on the right) that will change with the camera movement because they do not corres-
pond with a real 3D point. Specular highlights and curved surfaces can also be a source of bad features.
 
Search WWH ::




Custom Search