Game Development Reference
In-Depth Information
Recording and Playback
The principle of recording and playing back audio is actually pretty simple in theory. For
recording, we keep track of the point in time when certain amounts of pressure were exerted on
an area in space by the molecules that form the sound waves. Playing back these data is a mere
matter of getting the air molecules surrounding the speaker to swing and move like they did
when we recorded them.
In practice, it is of course a little more complex. Audio is usually recorded in one of two ways: in
analog or digitally. In both cases, the sound waves are recorded with some sort of microphone,
which usually consists of a membrane that translates the pushing from the molecules to some
sort of signal. How this signal is processed and stored is what makes the difference between
analog and digital recording. We are working digitally, so let's just have a look at that case.
Recording audio digitally means that the state of the microphone membrane is measured and
stored at discrete time steps. Depending on the pushing by the surrounding molecules, the
membrane can be pushed inward or outward with regard to a neutral state. This process is
called sampling , as we take membrane state samples at discrete points in time. The number
of samples we take per time unit is called the sampling rate . Usually the time unit is given in
seconds, and the unit is called hertz (Hz). The more samples per second, the higher the quality
of the audio. CDs play back at a sampling rate of 44,100Hz, or 44.1KHz. Lower sampling rates
are found, for example, when transferring voice over the telephone line (8KHz is common in
this case).
The sampling rate is only one attribute responsible for a recording's quality. The way in which we
store each membrane state sample also plays a role, and it is also subject to digitalization. Let's
recall what the membrane state actually is: it's the distance of the membrane from its neutral
state. Because it makes a difference whether the membrane is pushed inward or outward,
we record the signed distance. Hence, the membrane state at a specific time step is a single
negative or positive number. We can store this signed number in a variety of ways: as a signed
8-, 16-, or 32-bit integer, as a 32-bit float, or even as a 64-bit float. Every data type has limited
precision. An 8-bit signed integer can store 127 positive and 128 negative distance values.
A 32-bit integer provides a lot more resolution. When stored as a float, the membrane state is
usually normalized to a range between −1 and 1. The maximum positive and minimum negative
values represent the farthest distance the membrane can have from its neutral state. The
membrane state is also called the amplitude . It represents the loudness of the sound that hits it.
With a single microphone, we can only record mono sound, which loses all spatial information.
With two microphones, we can measure sound at different locations in space, and thus get
so-called stereo sound . You might achieve stereo sound, for example, by placing one microphone
to the left and another to the right of an object emitting sound. When the sound is played back
simultaneously through two speakers, we can reasonably reproduce the spatial component of
the audio. But this also means that we need to store twice the number of samples when storing
stereo audio.
The playback is a simple matter in the end. Once we have our audio samples in digital form and
with a specific sampling rate and data type, we can throw those data at our audio processing
unit, which will transform the information into a signal for an attached speaker. The speaker
interprets this signal and translates it into the vibration of a membrane, which in turn will cause
the surrounding air molecules to move and produce sound waves. It's exactly what is done for
recording, only reversed!
 
Search WWH ::




Custom Search