all about the algorithmbut which one will win?
the countless human pursuits touched by technology, music
has been among the most profoundly transformed. Beginning
more than a century ago, when Thomas Edison's phonograph
gave rise to the recorded music industry, technology has
brought music to the masses with steadily increasing efficiency,
fidelity, and convenience. Today, the Internet, digital
recording, and new storage technologies are coming together
to prompt another momentous shift. It is liberating music
from the last link to Edison's era: the dependence on physical,
recorded media that has long confined it.
the technologies fomenting this revolution, one of the
most pivotal, and interesting, is the compression algorithm.
The most common example is the ubiquitous MP3, which was
a key enabler of Napster's rise in its copyright-flouting
initial incarnation. MP3 is just one of an expanding array
of such algorithmsmore than 100 at last countthat
also includes such contenders as WAV, WMA, Ogg, AAC, and
AC-2. All of them use a variety of clever tricks to compress
music files 90 percent or more, so that the data can be
more economically transmitted over a network, such as the
Internet, and stored on a computer or music player. They're
all vying for a central role in the global recorded music
industry, which now generates US $32 billion a year in
alliances are being formed. And unlike many previous technology-related
business battles, technology may actually be a significant
factor in this one. Consider Sony Corp.'s slick new music
player, the NW-HD1. Praised for its compact size, long
battery life, and clever touch-sensitive controller, the
device nevertheless has been widely and bitterly criticized
for its choice of compression algorithm: ATRAC3, a proprietary
system used by Sony alone.
since the days of the PC operating system wars in the 1980s,
arguably, has a software issue held so much sway over an
emerging category of consumer electronics. And this time,
at least, technology will weigh fairly heavily as the marketplace
sorts out winners and losers.
the algorithms, the basic tradeoff is between sound quality
and how much they can compress the music files. But there
are other important considerations, including the extent
to which the full-fidelity, uncompressed files can be re-created
from the compressed files, how copy protection is implemented,
and how secure the downloaded files are from unauthorized
contest is far from over. But already, glimpses of a seriously
streamlined future for the sale and distribution of recorded
music are apparentones that are showing the way for
digital movies, too. The first part of the transition is
well under way: Apple's iTunes alone is selling about $500
000 worth of music a week; additional online music services
from RealAudio, Wal-Mart, Napster, and others are also
doing brisk business. The advantages over the old industry
model are overwhelming: record companies don't have to
ship plastic discs all over the world, and music fans need
no longer clutter their homes with racks of CDs or tapes.
Instead, somewhere near their favorite audio listening
spots are a hard drive (or two or three) and a computer
displaying a list of thousands and thousands of songs,
arranged by album, by artist, or simply by mood. No more
shuffling through stacks of discs; a click of a mouse changes
revolution is not happening just in the home. It is truly
everywhere. Once compressed, music files can be quickly
and easily loaded into a compact, shirt-pocket-size player,
where they are stored on a miniature hard-disk drive or
in flash memory. The hard-disk-based systems, such as Apple's
ubiquitous iPods, can store thousands of songsyour
entire music library, probably.
is just a hint of what's to come. Today, early adopters
are using wireless networks to move music from their computers
to audio gear all over their homes. When wireless personal
area networks based on the IEEE 802.15.3 standard become
commercially available, people will be storing movies this
way as well. Eventually, CDs and DVDs will join the vinyl
albums gathering dust in the backs of closets, and yet
music and movies will be everywherein file servers,
magnetic and semiconductor memories, communications lines,
and in the air itself.
IN MIND that
it was only five years ago that the music industry was
facing a civil war over the next-generation disc-based
music formatthe successor to the wildly successful
CD. At that time, hardly anybody doubted that the music
would be encoded optically on a round plastic disc the
size of a CD. The quibbling was over the technology of
that encoding, and two leading contenders emergedDVD-Audio
and Super-Audio CD [see box, "A
New Generation of CDs"].
then, while the audiophiles were fiddling, a few technophiles
were burning. Computer-savvy music buffs had started quietly
applying some of the earliest compression formats, such
as MP3 and WAV, to move their CD collections to their computer
hard drives. At the same time, other people were expanding
their collections by downloading music from the Web onto
15 It was
a grass-roots revolt that the consumer electronics manufacturers
couldn't ignore, and in 1998 they came out with the first
portable digital music players. By then, Napster, the file-sharing
system, was on its way to becoming a major, if controversial,
force in the recorded music industry.
with the move to the big time, challenges have come to
compression formats that the pioneers didn't worry about.
Playing compressed files on a high fidelity home system
demands the ability to re-create a high quality signal.
Copyright requirements, toonot a major consideration
until recentlyare now at the forefront as companies
turn downloading into a legitimate business.
challenge for technologists is to accommodate these new
needs without interfering with the essential purpose of
compression algorithms: making music files smaller. To
understand this function, consider the compact disc. Music
is converted into digits for a compact disc by a technique
called pulse code modulation. Basically, the music's two
channels are sampled 44 100 times a second, and each sample
is converted into a pair of 16-bit numbers, one for each
of the two stereo channels. Those numbers are then put
onto the disc. Generally, CDs can store a maximum of 74
minutes of music, with one minute of music occupying nearly
10 megabytes, including raw data and overhead.
with a broadband connection, it would take about three
and a half minutes to download each minute of music on
a CD. A typical 50-minute CD would take 3 hours to download.
Without compression, even the largest iPod, packing a 40-GB
hard drive, could store only about 67 hours of music. It
is compression that turns the little player into a library
in your pocket, capable of carrying about 500 hours of
trick is to take away bits without degrading the fidelity
of the sound that the listener hears. To do this, the algorithms
exploit quirks in human hearing and, more specifically,
in the way the human brain processes sound.
varies from person to person, based on such things as age,
sex, and previous exposure to loud noise. But even what
is commonly called perfect hearing isn't so perfect. Most
young people can hear frequencies between about 20 hertz
and 20 kilohertz. But most adults, particularly older ones,
cannot hear all that much above 16 kHz. And even within
the wide swath of frequencies that most people can hear,
some bands register more loudly than others.
there are the brain's perceptual quirks: it has trouble
distinguishing tones that are closely spaced in frequency,
and it reflexively masks any sounds that occur immediately
after sudden, louder ones. (These two are known as frequency
and temporal masking.) It also hears tones differently
when they are sounded in isolation or accompanied by other
tones. Therefore, masking algorithms that take into account
tones and the harmonics surrounding them are somewhat more
successful than those that ignore tone differences.
DISCS IGNORE these limitations of human hearing,
faithfully preserving all the sound between 20 Hz and
about 20 kHz, and doing it flat across the entire frequency
range, as though we perceived all bands equally. Compression
algorithms, on the other hand, start by analyzing the
mathematical patterns of the digitized sound. They compare
these patterns with psychoacoustic modelsmodels
of human auditory perception, basicallywhich vary
from algorithm to algorithm. The algorithm uses the models
to deemphasize or discard portions of the signal that
the listener isn't likely to hear.
the compression algorithm typically makes another pass
through the compressed signal, looking for opportunities
to further compress by eliminating redundancies. One way
of doing this is called Huffman coding, which represents
repetitive sounds more compactly. As a simple symbolic
illustration, ABCABCABCABC would become (ABC) * 4.
24 Of course,
because the musical experience is very subjective, there
is no single, universally accepted set of rules for compression
algorithms. In creating such an algorithm, also known as
a codec, for coder/decoder, the developer must decide which
and how many bits to throw out, striving for the best balance
between quality and compression. It's a trickier challenge
than it may seem. If you lean too far toward quality, you
risk winding up with a poor compression ratio and no guarantee
of consumer appreciation (recall the famous VHS/Betamax
videotape battle, in which the longer-playing format vanquished
the higher-quality one).
slick new NW-HD1 has been criticized for its choice of compression
26 On the
other hand, going too far toward minimizing file size will
bother aficionados, who periodically want to pipe the music
from their pocket players through their home stereos. They
would be dismayed to find that a high compression ratio
had obliterated details otherwise audible with a good amplifier
latter problem illustrates the importance of the tradeoffs
between what compression specialists call lossiness and
losslessness, the degree to which the original sound data
files can be exactly reconstructed from the compressed
data. In a perfectly lossless algorithm, the decoded stream
is bit-for-bit identical to the original stream, not just
representative of that data.
28 A common
strategy for achieving lossless compression emphasizes
taking advantage of repeating patterns in waveforms. The
compression algorithm uses these patterns to predict the
next value of the signal and then encodes the usually small
difference between the expected value and the actual value.
Such techniques can compress an audio file to nearly half
its original size.
formats are not without advantages; they give you somewhat
smaller file sizes with higher-quality reproductionthe
JPEG image format used in photography is one of the most
common in this category. But losslessness counters with
another advantage: it enables conversion to other formats.
After all, if you can re-create the original digitized
signal, or something very close to it, then you can convert
that signal to some other compression format. This feature
allows backward compatibility with existing hardware and
ULTIMATE FORMAT will have a high degree of lossless
compression, because this attribute is essential for
the futuristic scenario in which people simply store
all their music on their home computers, playing it in
high fidelity through their home stereos via ultrawideband.
important goal for the ideal compression algorithm is easy
streaming. Streaming allows data to be transferred in a
stream of packets that are interpreted and played as they
arrive. When you are record shopping online and you click
on a sample of music from a CD you're considering buying,
the music is streamed to your PC. Whether or not a format
is streamable is determined by the complexity of the algorithm,
the power of the user's computer, and, most critical, the
speed of the connection. In other words, if the algorithm
is simple enough for the computer to execute in real time
and the connection speed is fast enough to keep up, then
the music will be streamable.
but certainly not least, the compression format will have
to support digital rights management, or technical protectionthat
is, it must include technology that limits unauthorized
copying and distribution.
transportable compressed formats meet these ideal goals
to varying degrees. The most successful compression standard
by far to date is MP3, officially called MPEG-1 Layer III,
introduced as part of the MPEG-1 standard in 1992. The
MPEG standards come from the Moving Picture Experts Group,
a working group of the International Organization for Standardization,
in Geneva. MPEG-1 was the first compression format to come
out of that group, optimized for encoding video on CD-ROMs.
is a lossy format that compresses CD music to one-tenth
its original size and works well with streaming. At its
heart, the MP3 format uses an algorithm that takes the
data contained in CD music relating loudness to specific
points in time and transforms it instead into data relating
loudness to specific frequencies. Once that is done, extraneous
information can be eliminatedfor instance, if at
any point a frequency is too quiet for a typical listener
to hear, it can be thrown away.
gained popularity (and notoriety) as the format used for
music file swapping in the mid-1990s. A wide range of products
support it, including the majority of PDAs and dedicated
portable music players. The quality of MP3 audio depends
on the complexity of the signal to be encoded and on the
quality of the encoderwhich, as anyone who has used
several different MP3 encoders will tell you, varies widely.
For some listeners, MP3 audio is perfectly adequate. MP3
offered one of the highest compression ratios at the time
of its introduction, but it has since been surpassed by
newer formats. The quality is dependent on the compression
ratio selected. MP3 is easily streamable, but it contains
no digital rights management toolsthe reason it was
the darling of Napster and other file-sharing systems.
in 2001, mp3PRO is the next generation of MP3, offering
the same quality as MP3 at half the file size, with a compression
ratio of 20 to 1. It takes advantage of an audio compression
technique developed by Coding Technologies AB in Stockholm,
Sweden, that allows more of the data for the higher frequencies
to be eliminated. It then reconstructs the high frequencies
using an analysis of the low-frequency data along with
additional guidance information transmitted with the encoded
that is mp3PRO-encoded is hard to discern from CD audio,
especially when played back on a relatively low-fidelity
computer or personal audio player. This format is backward
compatible, which means that mp3PRO files can be played
in ordinary MP3 players, albeit with some degradation of
quality. Thomson Consumer Products and Royal Philips Electronics
have several players that support this new format [see
photo, "20:1 Compression"].
It has a compression ratio twice that of MP3 and is almost
lossless and easily streamable. It does not, however, have
digital rights management tools.
(WAVEform audio format) is an IBM and Microsoft audio file
format standard for storing audio on computers. WAV was
one of the first such formats, introduced 14 years ago
in Microsoft Windows 3.1, and is now most commonly used
on Windows-based PCs. WAV files are virtually the same
quality as files on audio CDs, but their very large size10
megabytes per minute of audiomakes them unsuitable
for everyday exchange via the Internet.
audio can also be edited and manipulated with software
relatively easily. As file sharing over the Internet has
become popular, the WAV format has declined in popularity,
primarily because WAV files take a long time to send. WAV
has one of the lowest compression ratios and is virtually
lossless, but it is not streamable and has no digital rights
Audio Coding (AAC) is a lossy data compression scheme intended
for audio streams. Designed to replace MP3, AAC is an extension
of the MPEG-2 international standard, which is widely used
for the transmission of digital video. It was further improved
in subsequent digital video formats, MPEG-4, MPEG-4 Version
2, and MPEG-4 Version 3. It has a wider range of sampling
frequencies than official MP3 (8 kHz to 96 kHz, compared
with the 16 kHz to 48 kHz of MP3) and handles high frequencies
provides better and more consistent quality than MP3 at
equivalent or slightly lower bit rates. In fact, depending
on the MP3 encoder used, 96-kilobit-per-second AAC can
give the same or better perceptional quality as 128-kb/s
other formats, aacPlus and Dolby AAC, both standardized
in 2001, enhance the standard AAC with proprietary technologies.
Trademarked as aacPlus by Coding Technologies, the technology
is also called AAC+.
ubiquitous iPods can store thousands of songsyour entire
music library, probably
people say "AAC" they usually really mean AC-2. Based primarily
on adaptive delta modulation technology as refined by Dolby
Laboratories, AC-2 was developed for professional audio
transmission and storage applications where encoder and
decoder complexity can be similar. It is embraced by Apple
in its iTunes service and by Real Audio in its new online
music store. In the iTunes version, Apple has added a digital
rights management system it has tagged FairPlay, a name
that Apple bought from Veridisc Inc., of Mundelein, Ill.
response to MP3 was the Windows Media Audio standard, WMA,
which was released in December 2000. With the introduction
of Apple's iTunes Music Store, WMA has been positioned
as a competitor to the AC-2 format used by Apple. In compression
ratio and quality, it is similar to MP3, and it offers
the advantage of copyright-protected songs that cannot
be published any further.
in WMA format can be played using Windows Media Player,
Winamp, and even iTunes for Windows. With the advent of
Windows Media Player 9, a new lossless codec has been introduced
to accompany the existing lossy codec. The new release
also supports variable bit rates. WMA features strong digital
Ogg standard began in 1993 by the Xiph.org Foundation.
It is an open-source project and can therefore be used
without licensing fees. Various components of the project
are intended to stand as alternatives to codecs that require
license fees, such as MP3 and most of the rest. The Ogg
codec includes lossy formats (which do a serviceable job
of reproducing the audio when decoded but do not reproduce
the original bit stream) such as Speeks, which handles
voice data at low bit rates (from about 8 to 32 kb/s per
channel), and Vorbis, which handles general audio data
at mid- to high-level bit rates (from about 32 to 256 kb/s
Ogg standard also includes lossless formats, such as Ogg's
original codec, Squish, along with its successor, FLAC
(Free Lossless Audio Codec), but has no digital rights
49 As one
of the lossy offshoots of Ogg, the Vorbis format has a
small but die-hard following that appreciates the format's
good fidelity and the fact that it costs nothing. Though
we may never see Ogg surpass MP3, Ogg has made major inroads
in the video-game sector because game developers can use
it without paying fees to anyone.
lack of widely available audio hardware players is hindering
Ogg's growth in mainstream audio, although such devices
do exist, including the Neuros MP3 Digital Audio Computer
with a firmware upgrade, the Rio Karma, the Xclef HD800,
and the Cowon iAudio M3.
WHO'S PULLING into the lead? It's hard to say
at the moment. MP3 is still the undisputed leader, although
its limitationssuch as degraded quality at low
bit ratesis starting to show. Certainly, with the
success of Apple's iTunes service, the AC-2 format has
surged into a strong position.
of these formats, however, has all the characteristics
necessary to dominate the market. None combines both lossless
transmission and storage with the built-in ability to adapt
to a variety of playback hardware.
ultimate format is most likely to come out of the Motion
Picture Experts Group, which has already brought us MP3,
mp3PRO, and AC-2. This will be true despite the patents
held on the MPEG algorithms that require them to be licensed
for use in commercial products. Even though open standards
such as Ogg have a large number of devotees and voluntary
developers, they will have a difficult battle competing
with internationally recognized standards in the commercial
regardless of which specific flavor of compression ultimately
wins, there is no question that compression will change
the way we collect and listen to music. Simply put, if
you're a music fan, the best is yet to come. The audio
library of the future will reside in a device about the
size of a deck of playing cards that contains at least
2000 hours of your favorite music, has a wireless interface
that communicates with your computer and your home and
car audio systems, has a battery life of at least 90 days,
and costs no more than a PDA or a cellphone. It will be
music to your ears.
general audio information, see http://www.audiolinks.nl/.
information on CD and DVD technology, see http://www.disctronics.co.uk/technology/index.htm.
information on audio formats and testing, see http://www.litexmedia.com/article/.
a concise audio history, see http://history.sandiego.edu/gen/recording/notes.html.
an overview on how CDs work, see http://entertainment.howstuffworks.com/cd.htm.