DIGITAL
AUDIO
by
Christopher Dobrian
from MSP: The Documentation
Cycling '74 and IRCAM, December 1997
http://music.arts.uci.edu/dobrian/digitalaudio.htm
and another ...
------------------------------------------------------------------------------
In order to create the best possible sounds of your own, it
is important to know something about digital sound. In this article I will try
to explain to you, in plain English, some things which will hopefully help you
a lot.
This document has been divided into several separate parts:
I think both parts contain useful facts, that everyone who uses digital audio to create music should know. So take a look, you'll probably find something worth reading in here ;-)
As you probably
know, sound is air which is moving very quickly. The speed of these movements
is called "frequency", which is a very important property of
sound, especially music. The frequency of a sound is measured in Hz (=Hertz,
named after a man called Hertz :-/ who did a lot of research into sound and
acoustics some time ago). Most people can hear frequencies in the range between
100Hz-15000Hz. Some people can hear very high frequencies above 19000Hz, but
scientists always assume that the human ear is able to discern frequencies
between 20Hz-20000Hz, since those numbers make their calculations a lot easier.
Here's a few examples of different frequencies, if you'd like to play with them
for a while:
Another very important property of sound is its level; most people call it volume. It is measured in dB (=deciBell, named after a man called deciBell (NOT!!) all right, his real name was Bell, but he did invent the telephone and that is why us Dutch people still say 'mag ik hier misschien even bellen?' when they want to use your phone).
So why don't we
measure loudness in Bell instead of deciBell? Well, mainly because your ear
really can discern an incredible amount (1.200.000.000.000, that's 11 zeroes)
of different loudness levels, so they had to think of a trick(which I'm not
going to explain here, sorry!) be able to describe an incredible range with
only a few numbers. They agreed to use 10th's of Bells, deciBells, dB,
instead of Bells.
Most professional audio equipment uses a VU meter (=Volume Unit meter) which
shows you the input or output level of your equipment. This is very convenient,
but only if you know how to use it: A general rule is to set up the input and
output levels of your equipment so that the loudest part of the piece
you want to record/play approaches the 0dB lights. It is important to stay on
the lower side of 0dB, because if you don't, your sound will be distorted badly
and there's no way to restore that. If you're recording to (analog!) tape, instead
of (digital) harddisk, you can increase the levels a bit, there is enough so-called
'headroom' (=ability to amplify a little more without distortion) to push the
VU-meters to +6dB. There is some more information on calibrating equipment levels
inthe recording section below.
Some examples of different levels, if you'd like to play with them for a while:
maximum level |
half power |
very quiet |
a little too loud-a lot of distortion |
Okay, now that you know the most important things about sound, let's finally go to the digital bit (ooh, a pun :-/ ): I've just told you about the properties of 'normal' (analog) sound. Now I'll tell you what the most important properties of digital sound are.
First of all, the famous 'sample rate'. The sample rate of a piece of digital audio is defined as 'the number of samples recorded per second'. Sample rates are measured in Hz, or kHz (kiloHertz, a thousand samples per second). The most common sample rates used in multimedia applications are:
8000 Hz |
11025 Hz |
22050 Hz |
really yucky |
not much better |
only use it if you have to |
Professionals use higher rates: |
||
32000 Hz |
44100 Hz |
48000 Hz |
only a couple of old samplers |
ahh, what a relief |
some audio cards, DAT recorders |
Some modern equipment has the processing power required to enable even higher
rates: 96000Hz or even an awesome 192.000Hz will possibly / probably be the
professional (DVD?) standard rates in couple of years. The advantages of a higher
samplerate are simple: increased sound quality. The disadvantages are also simple:
a sample with a higher samplerate requires an awful lot more disk space than
a low-rate sample. But with the harddisk and CD-R prices of today that isn't
too much of a problem anymore.
....But Why?!
To answer that, let's look at a single period of a simple sine wave:
|
|
When recording a certain frequency, you will need at least (but preferably more than) two samples for each period, to accurately record it's peak and valley. This means you will need a samplerate which is at least (more than) twice as high as the highest frequency you'd like to record, which, for humans, is around 20000Hz. That's why the pro's use 44100Hz or higher as the minimum samplerate! They can record frequencies up to 22050Hz with that. (Now you know why an 8000 Hz sample sounds so horrible: it only plays back a tiny part of what we can hear!)
Using an even
higher samplerate, like 96000Hz, you can record higher frequencies, but you
won't hear things like 48000Hz anyway. That's not the main goal of those super-rates.
If you record at 96000Hz, you will have more than four samples for each 20000Hz
period, so the chance of losing high frequencies will decrease dramatically!
It will take quite a few years for consumer level soundcards to support these
numbers, though. There are a few pro cards which already do, but you could easily
buy a small car for the same money...
That's enough about frequency for now. As I said before, another very important
property of sound is its level. Let's have a look at how digital audio cards
process the sound levels.
The capacity of digital audio cards is measured in bits, e.g. 8-bit soundcards, 16-bit soundcards. The number of bits a sound cards can manage tells you something about how accurately it can record sound: it tells you how many differences it can detect. Each extra bit on a sound cards gives you another 6dB of accurately represented sound (Why? Well, Because. It's just a way of nature). This means 8-bit soundcards have a dynamic range(=difference between the softest possible signal and the loudest possible signal) of 8x6dB=48dB. Not a lot, since people can hear up to 120dB. So, people invented 16-bit audio, which gives us 16x6dB=96dB. That's still not 120dB, but as you know, CD's sound really good, compared to tapes. Some freaks, that's including myself ;-) want to be able to make full use of the ear's potentials by spending money on soundcards with 18-bit, 20-bit, or even 24-bit or 32-bit ADC's (Analog to Digital Convertors, the gadgets that create the actual sample) which gives them dynamic ranges of 108dB, 120dB, or even 144dB or 192dB.
Unfortunately, all of the dynamic ranges I mentioned are strictly theoretical maximum levels. There's absolutely not a way in the world you'll get 96dB out of a standard 16-bit multimedia sound card!!! Most professional audio card manufacturers are quite proud of a dynamic range over 90 dB on a 16bit audio card. This is partly because of the fact that it's not that easy to put a lot of electronic components on a small area without a lot of different physical laws trying to get attention. Induction, conduction or even bad connections or (very likely) cheap components simply aren't very friendly to the dynamic range and overall quality of a soundcard. But there's another problem, that will become clear in the next paragraph.
Back in the old days, when the first digital piano's were put on the market, (most of us didn't even live yet) nobody really wanted them. Why not? Such a cool and modern instrument, and you coould even choose a different piano sound!
The problem with those things was that they weren't as sophisticated as today's
digital music equipment. Mainly because they didn't feature as many bits (and
so they weren't even half as dynamic as the real thing) but also because they
had a very clearly rough edge at the end of the samples.
Imagine a piano sample
like the one you see here. It slowly fades out until you here nothing.
At least, that's what you'll want... As you can see by looking at the two separate
images, that's not at all what you get... These images both are extreme close-ups
of the same area of the original piano sample. The highest image could be the
soft end of a piano tone. The lowest image however looks more like morse code
than a piano sample! the sample has been converted to 8 bit, which leaves only
256 levels instead of the original 65536. The result is devastating.
Imagine playing the digital piano in a very soft and subtle way, what'd you
get? some futuristic composition for square waves! That's not what you paid
for ;-) This froth is called quantization noise, because it is noise
that is generated by (bad) quantization.
There is a way to prevent this from happening, though. While sampling the piano,
the soundcard can add a little noise to the signal (about 3-6dB, that's literally
a bit of noise) which will help the signal to become a little louder. That way,
it might just be big enough to get a little more realistic variation instead
of a square wave. The funny part is that you won't hear the noise, because it's
so soft and it doesn't change as much as the recorded signal, so your ears automatically
forget it. This technique is called dithering. It is also used in some
graphics programs e.g. for resizing an image.
Another problem with digital audio equipment, is called jitter. Until now, I've always assumed that the soundcard recorded the sample at exactly 44100Hz, taking one sample every 1/44100 second. Unfortunately that is -totally- unreal. There *always* is a tiny timing error which causes the sample to be taken just a little too late or just a little too soon.
Does this make a big difference then? Well, you could start nagging about everything,
but then you'd probably have bought a more expensive soundcard in the first
place. The really bad part is that jitter is frequency dependent. Because it's
related to the timing of the sample, it can change the recorded frequencies
just a little. If it records a sample just a little too soon, the card thinks
that the recorded frequency is a little lower than it really is. This is noticable
at frequencies below 5000Hz but especially bad at the lowest frequencies, because
the influence of a little error is much bigger there. Typical jitter-times go
between 1.0 x 10 -9 seconds (that's a NANOsecond, read:almost nothing)
and 1.0 x 10 -7 seconds (that's a hundred NANOseconds, not a lot
more) but they make the difference between a 'pro' sound and a 'consumer' sound
on e.g. different CD-players.
When you record
a sample with your sound card, it goes through a lot of stages before you can
store it on your hard disk as a sound file. Fortunately you don't have to worry
about these stages, because modern sound cards and samplers take care of them
for you.
I'm going to be a big bore and tell you about these stages anyway.
Let's see what happens when you press 'rec':
The sound card starts a very accurate stopwatch (the samplerate). |
|
Then it transforms the sound coming in: it simply cuts off the very high frequencies which it cannot handle. This cripples the sound a lot, but it is required to prevent even more serious damage to the sound, which would make the sound unrecognizable. This is a low-pass (cut the 'high' frequencies, let the 'low' frequencies pass through) anti-aliasing (smoothing, blurring) filter (because it takes away some parts and leaves the rest) |
|
Every time the stopwatch has completed a cycle, the sound card's ADC looks at the filtered input signal. It calculates how loud the incoming sound is at that exact moment in time (very much like a microphone would measure air pressure) and transforms the loudness level into the nearest digital number. |
|
and shouts that number to the computer, which stores it somewhere in memory, probably on a hard disk. |
Sound card manufacturers put a brickwall-filter (look at the image below!) in
their sound card, to prevent a very nasty side-effect called 'foldover'.
Foldover is a pretty difficult concept, but I'll try to keep it simple.
It's more or less the same thing that happens when you look at a car's wheel when it drives past you very quickly. You'll sometimes see the wheel moving backwards. Another example can be found in old western movies where you'll see a train going by. The 'wheels' of the train will be moving backwards too, if the train's going fast enough.
All these 'illusions'
are foldover-effects. They occur when a fast system at regular intervals analyzes
something which is moving even faster than the system itself.
When recording at 22050Hz, your sound card will simply not be able to record
any frequencies above 11025Hz, because you need at least two samples for each
period, as described above. Without the low-pass filter, the sound card would
blindly try to record those frequencies. But afterwards, when you play back
the sample, you'll hear a totally different frequency instead of the original
one. Just like the car's wheel that seems to be moving backwards, while it really
isn't.
(The frequency you'll actually hear equals the sampling frequency minus the
original frequency, e.g. 22050-12050=10000Hz, instead of the original frequency,
in this case 12050Hz).
|
a brickwall filter at 4000Hz |
Therefore, the maximum frequency that can be recorded with a certain sample rate, is half the sample rate. That frequency is called the Nyquist frequency, sometimes abbreviated to fN, after a man named Harold Nyquist, who worked at Bell Telephone Laboratories and more or less invented audio sampling. A big guy in digital audio. Anyway, to prevent all that from happening, the sound card manufacturers put a special filter in their card (see figure of brickwall filter on the right).
This low-pass filter removes high frequencies like any equalizer or Hi-Cut Switch does, except it is *much* more agressive. You can see that the filter allows all sound below 1000Hz to pass through, and that it gives the frequency range of 1000Hz-3500Hz a small boost. (This boost is necessary to be able to cut off the higher frequencies with such violence.) Frequencies above 4000Hz are eliminated extremely agressively. That is why they call it a brickwall-filter, because of the wall-like slope.
The filter displayed above might be used for a sample rate of about 8000Hz, since an 8000Hz sample has a Nyquist frequency, the maximum recordable frequency, of 4000Hz. This makes it very important to choose the appropriate sample rate for your sample; that is, if you've got a legitimate reason not to record at 44100Hz, or higher ;-)
Let's go through this step by step.
We'll start by selecting File->New, somthing which every sample editor I know can handle ;-). You'll want to select the number of bits you'll want to use for each sample. You'll also want to select the sample rate. My advice is: pick the highest your hardware can handle. That is most likely 16 bits at 44100Hz, since most, if not all, consumer sound cards support CD-quality playback & record.
|
|
|
Then let the band, or whatever, play for a while, to see if you're recording levels aren't too high or too low. Your program probably supports input monitoring and If if yours doesn't, it should! Get yourself another program ;-) You'll probably see a variant of the good ole VU-meter I like the one to the right. The loudest part of the sound you want to capture to disk should be somewhere very near 0.0dB, but it should not, ever, never ever!! exceed 0.0dB, since that results in very nasty distortion, which is cool on analog recorders but really horrible in the digital world.
If you want that distortion efftect, get a program to do it for you, but don't record at a too high level! Sonic Foundry's Sound Forge has a really good Distortion feature. Also, there are lots of Direct-X plugins which emulate tube compression and tape saturation etc. This type of digital distortion is called 'clipping' because all samples that exceed the maximum level are 'clipped', (cut off and reduced) to the maximum level.
Don't set your
recording levels too low, though.It will further reduce the accuracy of your
home recording, since mutimedia cards already add a very significant bit of
noise. In fact, they sometimes hardly leave you any dynamic range at all!
So, be very picky about your input levels.
Next, think about the source of your recording. A microphone? A keyboard or
synthesizer? a DAT-tape? If the source already is digital, like with DAT and
CD, please go ahead and stay digital! Use a digital connection between the DAT
and the soundcard, to prevent the operation of digital-to-analog conversion
-> transmission through a cheap cable -> analog-to-digital conversion
from adding noise or distortion!
If you're recording
with a microphone, first let the microphone record a minute or so of 'silence'.
Then play that recorded 'silence' back over headphones and listen the amount
of noise coming from the room. Be sure to keep this data, because some good
programs can eliminate that noise from the actual recording, by using the data
as a 'noise print' (They analize the noise print data and then 'subtract' it
from the real recording. Sound Forge and CoolEdit have this great feature.)
Also, if you have the opportunity, try several different microphones for the
same recording. Learn to trust your ears. If you have several different recordings
of the same event, pick the one that sounds best. Don't automatically pick the
one recorded by the most expensive mic. That! Does! Not! Work! Pick the one
that sounds best. You'll be surprised to hear the number of top hits being recorded
with cheap mics. But I'm not saying you should be using cheap mics... There
are several pretty good all-round microphones available from $30 (like the Behringer
XM-2000). A really good mic for vocals and guitar is the SM-58 by Shure. These
are a little more expensive (over $100), but they are used all over the world
in pro studio's. The problem with these microphones is, you'll need a
pre-amplifier too, because the original microphone signal is very weak, and
an 'XLR-cable' to connect it to your gear. Most mixers have microphone pre-amps
on them. If you're looking for a good value-for-money mixer, I suggest you take
a look at Behringer's
website. They're not 100% top quality, but if 90% is good enough
for you (It's that last 10% of perfection which makes audio equipment so darn
expensive) Behringer is the place to be. No, I'm not getting payed to
tell you this ;-)
If you're recording from a different piece of hardware e.g. directly from synthesizer/keyboard,
check your manual to see if your hardware has balanced outputs. If it does,
you'll need to get/make two stereo jack plugs and three wires of the same length,
or even better: an insulated cable with three separately insulated wires (that's
a multi-buck issue, though...) to make sure your audio isn't distorted before
it goes into your sound card's inputs.
A normal wire has 1) a signal wire and 2) a ground wire. If you use normal wire over long distances, preferably close to stage lighting ;-) you'll notice the wire picks up an awful lot of noise and buzzing on the way. This has something to do with induction and magnetic fields but all you'll need to know is that it sucks. To prevent such 50Hz (AC power!) buzzing, the professionals use balanced cables.
The balanced cable system is a very nice way of connecting equipment over long distances without loss of sound quality or unwanted induction. This is possible because a balanced cable has three wires instead of two: 1) a signal wire, 2) an inverted signal wire and 3) a ground wire. At the output of the synthesizer / mixer / whatever, the output signal is routed to both the signal-wire and the inverted-signal-wire.
The signal going
to the inverted signal wire is then inverted (multiplied by -1, turned upside
down, given a phaseshift of 180 degrees) and transported together with the signal
wire all the way through the cable to the other connector and on the way, both
wires pick up all the usual noise and humms. But when the signal arrives at
its destination, the inverted signal is inverted again, so that the signal it
was carrying is back to normal again. But this inversion also inverts
the noise and buzz, so now we have: a signal wire with 1) the signal and 2)
the noise, and we have the re-inverted(=normal!) wire with 1)the signal and
2)the inverted noise. These two are mixed together by the equipment:
signal + signal + noise - noise, which gives twice the signal strength and no
noise whatsoever!
Well, If you're interested in reading more on (digital) audio, there's a lot
of sites to visit. I suggest you try some of them:
I've really enjoyed
writing this article. I think there will be more to come, if it's okay with
Thomas :-)
You can contact me with questions at the email adress below. Tell me what you
think of this article, I'd like to know.
Mazzel!
Joost Boomkamp
Student of Sound Design
Student of Sound & Music Software Development
Utrecht School of the Arts
fac: Art, Media & Technology
dep: Music Technology & Audio Design
e-mail: joost.boomkamp@student-kmt.hku.nl