Introduction to MIDI and Computer Music: Digital Audio, Part 2
When converting an analog audio signal to a digital representation, an AD converter takes amplitude measurements, or samples, of the waveform at equally spaced points in time. The sampling rate — i.e., the frequency of these measurements — determines the highest frequency of the input signal that can be captured. We saw in class that an insufficiently high sampling rate causes the loss of high frequencies, resulting in muted or murky sound quality. (For a review of these ideas, see Digital Audio, Part 1.)
Sampling resolution refers to the accuracy, rather than the frequency, of these amplitude measurements. The greater the accuracy, the more faithful is the digital representation of an analog audio signal. Accuracy depends on the kind of numbers used to encode the amplitude measurements. Most converters use integers to store these measurements.
Consider the waveform graph of an analog signal shown below.
The gray vertical lines indicate moments when we take a measurement of the amplitude of the analog signal, governed by the sampling rate. The gray horizontal lines are the integer values that are available for representing those measurements. In this example, there are only eight integers. When we take a measurement, we need to round off the real value of the analog waveform to the nearest integer. This process is called quantization.
The blue stars below are the integer measurements. The resulting series of integers is 5 6 7 7 5 4 3 1 2 5 7 5 7 4, which is the stream of numbers that the computer stores to represent this waveform.
Some of these measurements are very close to the waveform values, while others are not. For example, measurement A below is exactly right. However, measurement B requires a significant amount of round-off when we approximate the real analog value to the nearest integer.
The amount of round-off is called quantization error. It tends to be randomly distributed for audio signals of any complexity, so if you graph the amount of error for a stream of audio samples, you’ll get a series of random numbers. The quantization error for each measurement is graphed at the bottom of the illustration below.
In a digital system, when you play back random numbers as an audio signal, you get noise. This noise is added to the desired part of the signal. So, too much quantization error produces a noisy audio signal. For this reason, quantization error is also known as quantization noise.
To combat quantization noise, you merely need to increase the number of values that an integer can assume. In a digital system, integers are represented by groups of binary digits, or bits. The more bits you allocate to an integer, the greater the number of values that integer can have.
The following formula tells us exactly how the number of values an integer can have depends on the number of bits available to represent the integer.
number of values = 2N (where N is the number of bits)
So 3 bits gives us 8 possible values, from 0 to 7, while 4 bits gives us 16 values, from 0 to 15. (If you want to confirm this, click the More about Binary Numbers button on page 1 of the MIDI app.)
Notice that adding just a single bit doubles the number of values available. In the previous example, we used 3-bit quantization, with values from 0 to 7. Now we add one bit to perform 4-bit quantization. This gives us an additional horizontal integer line between every pair of lines in the previous example, doubling the resolution.
Now the circled measurement requires hardly any rounding off. Compare with the 3-bit case, shown again below.
The number of bits used to form an integer is known as the bit depth or word length. In the early days of samplers (mid-1980’s), it was common to use 8-bit samples. While that is much better resolution than our 4-bit sampling system above, it is far worse than the 16-bit samples used in CDs. In fact, adding 8 bits to each integer gives you a resolution that is 256 times greater. So in a 16-bit system, you get 256 additional integer grid lines between each pair of lines in an 8-bit system.
Using 16-bit integers gives you 65,536 distinct values. These are normally positioned symmetrically around zero, because that is the value that represents ambient air pressure, or zero amplitude. So the range of values for 16-bit samples is between -32,768 and 32,767.
What happens if the real analog amplitude value you want to represent requires an integer value greater than 32,767? Tough luck — there’s no way to represent a value greater than that with 16 bits, so the value will be truncated to 32,767. (The same thing happens on the negative end, with values that are less than -32,768.) This is called clipping, which produces distortion that can be severe if prolonged. The following illustration shows two places where the analog waveform exceeds the range of a 16-bit integer (the dotted segments of the curve), resulting in a clipped waveform (the flat segments at 32,767 and -32,768) after conversion to digital.
The sound files we use every day in computer music software typically have word lengths of 16 and 24 bits. 24-bit audio has 256 times the resolution of 16-bit audio. This is especially useful when you work with sound that has very low amplitude, such as at the ends of long piano decays or reverberation tails. Even when the signal gets close to zero, there will still be sufficient resolution to represent it accurately without adding too much quantization noise.
|8-bit integer||Historical interest only|
|16-bit integer||CD, consumer audio|
|24-bit integer||Professional audio|
|32 or 64-bit floating point
|Internal representation in software|
Uncompressed digital audio, described above, gives you very high quality. But it may require too much network bandwidth to transmit over slower Internet or cell phone connections. That’s where the familiar MP3 format comes in.
Consider the example of CD audio, which uses 16-bit samples at a 44,100 Hz sampling rate. There are two parallel streams, one for each channel, to produce stereo. What is the transmission rate of CD-quality audio?
As long as you understand the terms involved, this is a straightforward math problem. For each of the two channels, there are 44,100 samples per second. Each of these samples requires 16 bits. Transmission rates are normally described in terms of the number of bits per second that must flow from the source to the destination. In our case:
44100 samples per second * 16 bits per sample * 2 channels = 1411.2 kbps
where kbps is “kilobits per second,” or thousands of bits per second
This works out to about 10 MB (megabytes) per minute.
The MP3 format was invented in the mid-1990’s to reduce the data transmission and storage demands of uncompressed audio. The kind of compression used to reduce the size of word processing files, such as ZIP, is not very effective for audio. Instead, MP3 uses perceptual coding to minimize the amount of data required to transmit and store audio. This takes advantage of psychoacoustic principles such as masking, in which, for example, loud sounds make it hard to hear soft sounds that are close in frequency. Instead of having a fixed number of bits per sample, as in uncompressed audio, an MP3 compressor allocates bits flexibly to different portions of the sound. Parts of the sound stream that require greater resolution get more bits. Parts that might be masked in our hearing get fewer bits. By being stingy with bits in this way, a compressor can save a lot of storage space. There are different data rates available for MP3 files, depending on the quality you want (e.g., 128 kbps, 160 kbps, 192 kbps, 256 kbps, etc.). These are anywhere from 5 to 11 times smaller than uncompressed audio.
The catch is that MP3 is a lossy format. Although it is based on sophisticated assumptions about how human hearing works, the bit allocation mentioned above discards information. When you turn an MP3 file into uncompressed audio, such as when you play it, some of the original sound data is gone forever. MP3 does a good job, but people who are very sensitive to audio quality, and are listening on professional equipment, can hear the artifacts of compression.
The AAC format (.m4v) is a more recent lossy format that achieves better sound quality than MP3 for the same amount of storage space. AAC is the default format for iTunes.