This is a follow up to my 2009 article Native Audio in the Browser, which covers the basics of HTML5 audio. It may well be worth reading if you want to get a feel for the
<audio> element and associated API.
Now, two and a half years later, it’s time to see how things are progressing. With many new advanced audio APIs being actively worked on and plenty of improvements to the existing native audio we all know and love, it’s certainly an exciting time to revisit the heady world of
A good way of understanding how the land lies is by going through a few use cases. That’s what I’ll attempt to do in this post.
So how do we get started? Well, there are a few things we need to do to prepare the ground. Let’s tackle MIME types first.
MIME Types #
MIME types (also known as Internet Media Types) are a way of defining file formats so that your system knows how to handle them.
Server Side #
First things first: your media server should be configured to serve correct MIME types. In the case of the Apache web server, this means adding the following lines to your
# AddType TYPE/SUBTYPE EXTENSION AddType audio/mpeg mp3 AddType audio/mp4 m4a AddType audio/ogg ogg AddType audio/ogg oga AddType audio/webm webma AddType audio/wav wav
Client Side #
When defining sources in your code or markup, you can also specify the MIME type, which will help the browser identify the media correctly.
To set up HTML5 audio in the most robust manner, you could write something like this:
<audio> <source src="elvis.mp3" type='audio/mpeg; codecs="mp3"'> <source src="elvis.oga" type='audio/ogg; codecs="vorbis"'> <!-- add your fallback solution here --> </audio>
Here we define the element and the sources to use. The browser will only pick one. It won’t play both. In this code, we also place a fallback solution after the <
Along with the source, we specify a
type attribute. Although not strictly necessary, this attribute allows the browser to know the MIME type and the codecs of the supplied media before it downloads it. If not, supplied the browser will guess and take a trial-and-error approach to detecting the media type.
Cool. So now we know how to define our audio sources, and the browser will happily pick the first source that it supports. But what if we want to supply the correct source in a more dynamic manner?
Knowing in Advance:
canPlayType Can Help, Probably #
Fortunately the audio API provides us with a way to find out whether a certain format is supported by the browser. But first, here’s a quick recap on how we manipulate the audio element via the API.
If you mark up your element in HTML as we did in our previous example, you can grab the
<audio> element by doing the following:
var audio = document.getElementsByTagName('audio')[index]; // or, if you gave it an id attribute var audio = document.getElementById('my-audio-id');
var audio = new Audio();
Once you have your audio element, you’re ready to access its methods and properties. To test format support, you can use the
canPlayType method, which takes a MIME type as a parameter:
You can even explicitly include the codec:
canPlayType returns one of three values:
- “” (the empty string).
The reason we have these odd return types is because of the general weirdness surrounding codecs. The browser can only guess at whether a certain codec is playable without actually trying to play it.
So to test for support, you could do this:
var audio = new Audio(); var canPlayOgg = !!audio.canPlayType && audio.canPlayType('audio/ogg; codecs="vorbis"') != "";
All we’re doing here is checking that
canPlayType is supported (
!! effectively casts to a boolean) and then checking that
canPlayType of our chosen format doesn’t return an empty string.
Current Browser Codec Support #
Let’s check the codec support in the current crop of modern browsers.
|Desktop Browser||Version||Codec Support|
|Internet Explorer||9.0+||MP3, AAC|
|Chrome||6.0+||Ogg Vorbis, MP3, WAV†|
|Firefox||3.6+||Ogg Vorbis, WAV|
|Safari||5.0+||MP3, AAC, WAV|
|Opera||10.0+||Ogg Vorbis, WAV|
† WAV since Chrome 9
|Mobile Browser||Version||Codec Support|
|Mobile Safari (iPhone, iPad, iPod Touch)||iOS 3.0+||MP3, AAC|
The good news is that at the time of writing, it’s estimated that around 80% of browsers now support HTML5 audio.
The bad news is that there is still no consensus on which codec to support, so you’ll need to provide both MP3 and Ogg Vorbis sources in order to take full advantage of HTML5 audio.
Containers, Formats, and File Extensions (oh, and MIME types again) #
Above, I’ve referred to the audio formats as they’re commonly known, but technically we should refer to their container format. (A container can contain more than one format — e.g., MP4 can contain AAC or AAC+.)
|Container||Format(s)||File Extensions||MIME Type||Codec String|
|MP4||AAC, AAC+||.mp4, .m4a, .aac||audio/mp4||mp4a.40.5|
|OGA/OGG||Ogg Vorbis||.oga, .ogg||audio/ogg||vorbis|
<audio> and we’re not afraid to use it! #
Okay, we’ve done the minimum amount of work to get our audio element set up and playable. What else can we do? At the moment, we’re relying on the browsers’ default audio players, each of which looks and works a little bit differently than the rest. Perhaps we’d like to customise the experience and create our own. To help us do that, the
<audio> element supports several different properties exposing its current state.
Some of the more commonly used properties:
|currentTime||playhead position||double (seconds)|
|duration||media duration||double (seconds); read-only|
|muted||is volume muted?||boolean|
|paused||is media paused?||boolean|
|volume||volume level||double (between 0 and 1)|
Using these properties is pretty straightforward. For example:
var audio = new Audio(); var duration = audio.duration;
duration now holds the duration (in seconds) of the audio clip.
Buffering, Seeking, and Time Ranges #
The situation is improving in this area, as browser makers start to implement key parts of the spec.
The API provides attributes called
seekable that can be used in situations where we want to ascertain which part of the media has been buffered, preloaded, or is ready to be played without delay.
Let’s first take a look at the
seekable attributes. Both return a TimeRanges object. The
TimeRanges object is a list of time periods containing start and end times that can be referenced by their indexes.
buffered Attribute #
buffered attribute will return the time ranges that have been completely downloaded. A little bit of a code:
// returns TimeRanges object of buffered media var buffered = audio.buffered; // returns time in seconds of the last buffered TimeRange var bufferedEnd = audio.buffered.end();
TimeRanges Object #
TimeRanges object contains data on the parts on buffered media in the form of one or more — you guessed it — time ranges. A
TimeRanges object consists of these properties:
length— number of time ranges
start(index)— start time in seconds of a particular time range
end(index)— end time in seconds of a particular time range
You may be wondering in what situation the
TimeRanges object would contain more than one time range. Imagine the user clicks forward to a portion of unbuffered media. The idea is that the media would then start buffering from that point, and you’d have two time ranges:
------------------------------------------------------ |=============| |===========| | ------------------------------------------------------ 0 5 15 19 21
So in this case:
Note that if the user is actively seeking through the media throughout the buffering process, a contiguous buffered progress bar actually makes little sense. Also consider that some browsers will read part of the end of the file to establish duration and so create two time ranges almost immediately. Now you’re starting to appreciate why making an accurate buffered progress bar is a little tricky!
You can check out
TimeRanges in real-time using this handy HTML5 Media Event Inspector.
Seeking and Seekable #
Seeking is the act of looking forward (or backward) in a media file. This usually happens when a section of media is requested before it’s finished loading.
seeking attribute can be used to determine whether that part of the media is being actively “seeked”. When it returns true, the portion of the media the user requested is still being loaded.
To recap a little, the
buffered property tells us what’s been downloaded and is often used as an indication of what part of the media can be directly played. If the browser supports it, however, it may make more sense to use the
seekable attribute to determine which parts of the media can be jumped to and played immediately.
seekable returns a
TimeRanges object of time ranges that can be played immediately. This uses a technology known as byte-range requests, which allows part of the content to be requested over HTTP. In short, we don’t have to load all the data prior to the desired part in order to play it.
// Is the player currently seeking? var isSeeking = audio.seeking; // Is the media seekable? var isSeekable = audio.seekable && audio.seekable.length > 0; // Time in seconds within which the media is seekable. var seekableEnd = audio.seekable.end();
Time ranges can be confusing.
audio.seekable.end() actually tells us the end point of the last time range (not the end point of all seekable media). In practice, though, this is good enough, as the browser either enables range requests or doesn’t. If it doesn’t, then
audio.seekable will be equivalent to
audio.buffered, which will give a valid indication of the end of seekable media.
Note that media being in a “seekable” state is different than media being in a “buffered” state. Media doesn’t have to be buffered to be seekable.
Buffered and seekable data can be useful information, but it would be too easy if there weren’t a few gotchas.
- Preloading is not supported in all older audio-capable browsers (Opera 10/Firefox 3.6).
bufferedattribute is not always supported (Blackberry PlayBook).
- Not all HTML5 browsers allow byte-range seeking — for example, Safari on Windows. In this case, you can still seek within the downloaded content. It downloads from the start until the end, as does Flash.
If you want to deliver the best possible solution to your users, you’ll need to feature test.
A Note about Preloading #
I mentioned the
preload attribute in the previous article. This attribute accepts three possible values:
none— Do not preload any media. Wait for a play event before downloading anything.
metadata— Preload just the metadata. Grab the start and the end of the file via range-request and determine the duration.
auto— Preload the whole file. Grab the start and the end of the file to determine duration, then seek back to the start again for the preload proper.
When it comes to preloading, remember that it’s a request or hint to tell the browser how you’d like to preload the media. The browser is not under any obligation to fulfill that request. For this reason, certain browsers may handle these requests differently.
It’s worth mentioning the
played property in passing. This property tells us which time ranges have been played within the media. For example:
// returns a TimeRanges object var played = audio.played;
Media Events #
Underpinning both the native audio and video APIs is a comprehensive set of events. A full list can be found at the WhatWG version of the spec.
Attaching handlers to media events allows you to easily react to changes in state. For example, it might be useful to update the time info in a custom-built player each time the
timeupdate event occurs.
A quick summary of the more commonly used media events:
|ended||Playback has stopped as the end of the media was reached.|
|pause||The media playback has been paused. Note there is no
|play||The media has started playing.|
|timeupdate||The current playback position changed (usually every 250ms).|
|volumechange||The volume changed.|
Other useful events:
|canplay||The media can be played but may need to pause while the file is downloaded.|
|canplaythrough||At current download rates, it is estimated that the media can be played from start to finish without pause.|
|progress||The browser is fetching the media data (usually every 250ms).|
Again, the HTML5 Media Event Inspector is your friend.
Additionally, you can test browser support by using areweplayingyet.org. It’s worth checking out the code behind the tests to understand the goings-on under the hood.
Streaming audio is a common requirement. This is an area that until recently has been dominated by Adobe’s Flash technology. Proprietary server technologies and protocols are well-established. One of the leading examples of this is Adobe’s Real Time Messaging Protocol (RTMP). However, current browsers only support streaming over HTTP, and so something like Flash is required to process RTMP streams.
Currently, audio-enabled browsers only support SHOUTcast and Icecast servers that stream audio over HTTP. SHOUTcast is propriety and allows streaming of MP3 and AAC files only, while Icecast is non-propriety and supports Ogg Vorbis as well as MP3 and AAC.
|Stream||AAC||MP3||Ogg Vorbis||Native Support|
|RTMP||Yes||Yes||No||No (requires Flash)|
An Evolving Spec (Or, “Whoa, this thing is moving!”) #
As the audio spec is still evolving, there are a couple of inconsistencies in browser implementations to watch out for. These generally affect older browsers.
load Method #
Until fairly recently, the
load method was required in order to tell the browser about a change in the media source and to get it to update appropriately. Now,
media.load() causes the element to reset and to start selecting and loading a new media resource from scratch.
So for older browsers, such as Firefox 3.6, in order to change the media source, you’ll not only need to change the source value but also issue a load command:
var audio = new Audio(); audio.setAttribute("src","mynew.mp3"); audio.load(); // required for 'older' browsers
When Browsers Go Off-Spec #
When creating cross-browser solutions, it’s useful to know which browsers follow the W3C spec to what extent and how they deviate.
Autoplay and Volume #
iOS and Blackberry devices ignore the
autoplay attribute and media element volume changes, whereas Android supports
autoplay but not volume changes.
Effectively, on Blackberry and iOS devices,
autoplay functionality is disabled as both require a user-initiated event to “kick it off”. This has the unfortunate side-effect that any audio suffers a delay between user interaction and playback.
To mitigate against these issues, you may want to take a look at Doctor Remy Sharp’s excellent article on Audio Sprites and fixes for iOS.
Simultaneous Playback of Multiple Audio Elements #
A particular annoyance for developers of anything but the most simple audio web apps: not all browsers will play multiple audio elements simultaneously. This is a particular problem for games developers. Again, we see this issue with iOS and Blackberry devices.
Operating System Dependence #
Safari (5+) relies on Quicktime being installed. This is rarely a problem unless you are running it on Windows.
Internet Explorer (9+) relies on the codecs being present at the operating system level. As it only runs on Windows, this is fortunately almost always the case.
What’s New? #
There are a few new features and whole new APIs being specified for web-based audio, so let’s take a look at what’s around the corner.
A Change of Pace #
A couple of noteworthy upcoming features are
defaultPlaybackRate. As you can probably imagine, these fellas let us alter the speed and direction of playback. This functionality could be used for fast-forward and rewind functions or perhaps to allow users to tweak the playback speed so they can fit more podcasts into their day.
audio.playbackRatereturns 1 at normal speed and acts as a multiple that is applied to the rate of playback. For example, setting
playbackRateto 2 would double the speed, while setting it to -1 would play the media backwards.
audio.defaultPlaybackRateis the rate at which the audio will play after you pause and restart the media (or issue any event for that matter).
Media Fragments #
Currently, if we wish to reference some part of a media file, we often first have to download at least some of the media we don't actually need. The W3C Media Fragments proposal was created to address this issue and others, such as the retrieval of an area of video or just the associated text track. In the case of audio, it will allow us to specify which parts we want to download using parameters on the source URI.
Advanced Audio APIs: The Future Sound of Browsers #
Not to be outdone, Google released the Web Audio API for Chrome and put it forward as a W3C proposal. The Web Audio API is a higher-level implementation that takes an audio-node based approach and provides many useful functions.
Both implementations address the desire by developers to do more with audio — to create, manipulate, analyse, and mix audio on the fly. In effect, we may one day be able to do in a browser anything we can currently do with native applications.
So what if we want to play with advanced audio in the browser? Well, two separate implementations currently exist. If you want to write applications that use both APIs in any reasonable way, you need third-party bridging libraries to abstract away the differences.
Also fairly new on the scene is Mozilla's MediaStream Processing API, which takes another approach to audio and video streams to allow us to manipulate at both high and low levels.
For those curious about advanced web audio, I wrote about the differences in approaches and the different libraries that allow us to write cross-browser solutions in HTML5 Audio APIs: How Low can we Go?
The good news is that all relevant parties are talking, and things seem to be buzzing on the W3C Audio Working Group. It looks like everyone is coming together to put their ideas into a unified Audio Processing API, which is rapidly approaching publication as a Web Standard.
Although browser implementations of the current HTML5 audio spec are improving, there are still a few issues to watch out for when creating comprehensive cross-browser solutions. You certainly need to be aware of browser limitations on platforms like iOS. Hopefully, as support matures, these issues will disappear.
There are also signs that we may be seeing the end of the requirement for different browser-dependent codecs. Opera Mobile already supports MP3 where the underlying OS supports it, and Mozilla look like they may enable this too. If this option exists for mobile browsers, it's not a huge stretch to imagine that desktop browsers will follow suit.
So lots happening and lots already achieved. The rough edges of existing browser implementations are being smoothed, consensus and standards are being forged, and we can see the foundations of some very exciting new technologies being established.
The future of web-based audio looks bright!