HTML5 Audio — The State of Play

by .

This is a follow up to my 2009 article Native Audio in the Browser, which covers the basics of HTML5 audio. It may well be worth reading if you want to get a feel for the <audio> element and associated API.

Now, two and a half years later, it’s time to see how things are progressing. With many new advanced audio APIs being actively worked on and plenty of improvements to the existing native audio we all know and love, it’s certainly an exciting time to revisit the heady world of <audio>.

A good way of understanding how the land lies is by going through a few use cases. That’s what I’ll attempt to do in this post.

So how do we get started? Well, there are a few things we need to do to prepare the ground. Let’s tackle MIME types first.

MIME Types

MIME types (also known as Internet Media Types) are a way of defining file formats so that your system knows how to handle them.

Server Side

First things first: your media server should be configured to serve correct MIME types. In the case of the Apache web server, this means adding the following lines to your .htaccess file:

# AddType TYPE/SUBTYPE EXTENSION
AddType audio/mpeg mp3
AddType audio/mp4 m4a
AddType audio/ogg ogg
AddType audio/ogg oga
AddType audio/webm webma
AddType audio/wav wav

Client Side

When defining sources in your code or markup, you can also specify the MIME type, which will help the browser identify the media correctly.

To set up HTML5 audio in the most robust manner, you could write something like this:

<audio>
   <source src="elvis.mp3" type='audio/mpeg; codecs="mp3"'>
   <source src="elvis.oga" type='audio/ogg; codecs="vorbis"'>
   <!-- add your fallback solution here -->
</audio>

Here we define the element and the sources to use. The browser will only pick one. It won’t play both. In this code, we also place a fallback solution after the <source> elements.

Along with the source, we specify a type attribute. Although not strictly necessary, this attribute allows the browser to know the MIME type and the codecs of the supplied media before it downloads it. If not, supplied the browser will guess and take a trial-and-error approach to detecting the media type.

Cool. So now we know how to define our audio sources, and the browser will happily pick the first source that it supports. But what if we want to supply the correct source in a more dynamic manner?

Knowing in Advance: canPlayType Can Help, Probably

Fortunately the audio API provides us with a way to find out whether a certain format is supported by the browser. But first, here’s a quick recap on how we manipulate the audio element via the API.

If you mark up your element in HTML as we did in our previous example, you can grab the <audio> element by doing the following:

var audio = document.getElementsByTagName('audio')[index];

// or, if you gave it an id attribute
var audio = document.getElementById('my-audio-id');

Alternatively, you can also create your element entirely in JavaScript:

var audio = new Audio();

Once you have your audio element, you’re ready to access its methods and properties. To test format support, you can use the canPlayType method, which takes a MIME type as a parameter:

audio.canPlayType('audio/ogg');

You can even explicitly include the codec:

audio.canPlayType('audio/ogg; codecs="vorbis"');

canPlayType returns one of three values:

  1. probably,
  2. maybe, or
  3. “” (the empty string).

The reason we have these odd return types is because of the general weirdness surrounding codecs. The browser can only guess at whether a certain codec is playable without actually trying to play it.

So to test for support, you could do this:

var audio = new Audio();
  var canPlayOgg = !!audio.canPlayType && audio.canPlayType('audio/ogg; codecs="vorbis"') != "";

All we’re doing here is checking that canPlayType is supported (!! effectively casts to a boolean) and then checking that canPlayType of our chosen format doesn’t return an empty string.

Current Browser Codec Support

Let’s check the codec support in the current crop of modern browsers.

Desktop browser audio codec support
Desktop Browser Version Codec Support
Internet Explorer 9.0+ MP3, AAC
Chrome 6.0+ Ogg Vorbis, MP3, WAV†
Firefox 3.6+ Ogg Vorbis, WAV
Safari 5.0+ MP3, AAC, WAV
Opera 10.0+ Ogg Vorbis, WAV

† WAV since Chrome 9

Mobile browser audio codec support
Mobile Browser Version Codec Support
Opera for Android 14.0+ Device-dependent††
Mobile Safari (iPhone, iPad, iPod Touch) iOS 3.0+ MP3, AAC
Android Stock browser 2.3+ Device-dependent††
Chrome for Android all Device-dependent††
Firefox for Android all Device-dependent††
Firefox OS ? ?
Windows Phone ? MP3, AAC
Blackberry 6.0+ MP3, AAC

†† mobile browsers generally use codecs on the host operating system. Various versions of Android include different codecs. Generally, mp3 and Ogg Vorbis can be assumed. See Supported Media Formats in official Android documentation for more.

The good news is that at the time of writing, it’s estimated that around 80% of browsers now support HTML5 audio.

The bad news is that there is still no consensus on which codec to support, so you’ll need to provide both MP3 and Ogg Vorbis sources in order to take full advantage of HTML5 audio.

Containers, Formats, and File Extensions (oh, and MIME types again)

Above, I’ve referred to the audio formats as they’re commonly known, but technically we should refer to their container format. (A container can contain more than one format — e.g., MP4 can contain AAC or AAC+.)

Container Format(s) File Extensions MIME Type Codec String
MP3 MP3 .mp3 audio/mpeg mp3
MP4 AAC, AAC+ .mp4, .m4a, .aac audio/mp4 mp4a.40.5
OGA/OGG Ogg Vorbis .oga, .ogg audio/ogg vorbis
WAV PCM .wav audio/wav 1

We have <audio> and we’re not afraid to use it!

Okay, we’ve done the minimum amount of work to get our audio element set up and playable. What else can we do? At the moment, we’re relying on the browsers’ default audio players, each of which looks and works a little bit differently than the rest. Perhaps we’d like to customise the experience and create our own. To help us do that, the <audio> element supports several different properties exposing its current state.

Some of the more commonly used properties:

Property Description Return Value
currentTime playhead position double (seconds)
duration media duration double (seconds); read-only
muted is volume muted? boolean
paused is media paused? boolean
volume volume level double (between 0 and 1)

Using these properties is pretty straightforward. For example:

var audio = new Audio();
var duration = audio.duration;

The variable duration now holds the duration (in seconds) of the audio clip.

Buffering, Seeking, and Time Ranges

The situation is improving in this area, as browser makers start to implement key parts of the spec.

The API provides attributes called buffered and seekable that can be used in situations where we want to ascertain which part of the media has been buffered, preloaded, or is ready to be played without delay.

Let’s first take a look at the buffered and seekable attributes. Both return a TimeRanges object. The TimeRanges object is a list of time periods containing start and end times that can be referenced by their indexes.

The buffered Attribute

The buffered attribute will return the time ranges that have been completely downloaded. A little bit of a code:

// returns TimeRanges object of buffered media
var buffered = audio.buffered;

// returns time in seconds of the last buffered TimeRange
var bufferedEnd = audio.buffered.end();

The TimeRanges Object

The TimeRanges object contains data on the parts on buffered media in the form of one or more — you guessed it — time ranges. A TimeRanges object consists of these properties:

  1. length — number of time ranges
  2. start(index) — start time in seconds of a particular time range
  3. end(index) — end time in seconds of a particular time range

You may be wondering in what situation the TimeRanges object would contain more than one time range. Imagine the user clicks forward to a portion of unbuffered media. The idea is that the media would then start buffering from that point, and you’d have two time ranges:

------------------------------------------------------
|=============|                    |===========|     |
------------------------------------------------------
0             5                    15          19    21

So in this case:

  • audio.buffered.length returns 2
  • audio.buffered.start(0) returns 0
  • audio.buffered.end(0) returns 5
  • audio.buffered.start(1) returns 15
  • audio.buffered.end(1) returns 19
  • audio.buffered.end() returns 19

Note that if the user is actively seeking through the media throughout the buffering process, a contiguous buffered progress bar actually makes little sense. Also consider that some browsers will read part of the end of the file to establish duration and so create two time ranges almost immediately. Now you’re starting to appreciate why making an accurate buffered progress bar is a little tricky!

You can check out TimeRanges in real-time using this handy HTML5 Media Event Inspector.

Seeking and Seekable

Seeking is the act of looking forward (or backward) in a media file. This usually happens when a section of media is requested before it’s finished loading.

The seeking attribute can be used to determine whether that part of the media is being actively “seeked”. When it returns true, the portion of the media the user requested is still being loaded.

To recap a little, the buffered property tells us what’s been downloaded and is often used as an indication of what part of the media can be directly played. If the browser supports it, however, it may make more sense to use the seekable attribute to determine which parts of the media can be jumped to and played immediately.

seekable returns a TimeRanges object of time ranges that can be played immediately. This uses a technology known as byte-range requests, which allows part of the content to be requested over HTTP. In short, we don’t have to load all the data prior to the desired part in order to play it.

An example:

// Is the player currently seeking?
var isSeeking = audio.seeking;

// Is the media seekable?
var isSeekable = audio.seekable && audio.seekable.length > 0;

// Time in seconds within which the media is seekable.
var seekableEnd = audio.seekable.end();

Time ranges can be confusing. audio.seekable.end() actually tells us the end point of the last time range (not the end point of all seekable media). In practice, though, this is good enough, as the browser either enables range requests or doesn’t. If it doesn’t, then audio.seekable will be equivalent to audio.buffered, which will give a valid indication of the end of seekable media.

Note that media being in a “seekable” state is different than media being in a “buffered” state. Media doesn’t have to be buffered to be seekable.

Buffered and seekable data can be useful information, but it would be too easy if there weren’t a few gotchas.

  1. Preloading is not supported in all older audio-capable browsers (Opera 10/Firefox 3.6).
  2. The buffered attribute is not always supported (Blackberry PlayBook).
  3. Not all HTML5 browsers allow byte-range seeking — for example, Safari on Windows. In this case, you can still seek within the downloaded content. It downloads from the start until the end, as does Flash.

If you want to deliver the best possible solution to your users, you’ll need to feature test.

A Note about Preloading

I mentioned the preload attribute in the previous article. This attribute accepts three possible values:

  1. none — Do not preload any media. Wait for a play event before downloading anything.
  2. metadata — Preload just the metadata. Grab the start and the end of the file via range-request and determine the duration.
  3. auto — Preload the whole file. Grab the start and the end of the file to determine duration, then seek back to the start again for the preload proper.

When it comes to preloading, remember that it’s a request or hint to tell the browser how you’d like to preload the media. The browser is not under any obligation to fulfill that request. For this reason, certain browsers may handle these requests differently.

Well-Played

It’s worth mentioning the played property in passing. This property tells us which time ranges have been played within the media. For example:

// returns a TimeRanges object
var played = audio.played;

Media Events

Underpinning both the native audio and video APIs is a comprehensive set of events. A full list can be found at the WhatWG version of the spec.

Attaching handlers to media events allows you to easily react to changes in state. For example, it might be useful to update the time info in a custom-built player each time the timeupdate event occurs.

A quick summary of the more commonly used media events:

Event Description
durationchange The duration attribute has been updated.
ended Playback has stopped as the end of the media was reached.
pause The media playback has been paused. Note there is no stop event.
play The media has started playing.
timeupdate The current playback position changed (usually every 250ms).
volumechange The volume changed.

Other useful events:

Event Description
canplay The media can be played but may need to pause while the file is downloaded.
canplaythrough At current download rates, it is estimated that the media can be played from start to finish without pause.
progress The browser is fetching the media data (usually every 250ms).

Again, the HTML5 Media Event Inspector is your friend.

Additionally, you can test browser support by using areweplayingyet.org. It’s worth checking out the code behind the tests to understand the goings-on under the hood.

Streaming

Streaming audio is a common requirement. This is an area that until recently has been dominated by Adobe’s Flash technology. Proprietary server technologies and protocols are well-established. One of the leading examples of this is Adobe’s Real Time Messaging Protocol (RTMP). However, current browsers only support streaming over HTTP, and so something like Flash is required to process RTMP streams.

Currently, audio-enabled browsers only support SHOUTcast and Icecast servers that stream audio over HTTP. SHOUTcast is propriety and allows streaming of MP3 and AAC files only, while Icecast is non-propriety and supports Ogg Vorbis as well as MP3 and AAC.

Stream AAC MP3 Ogg Vorbis Native Support
Icecast Yes Yes Yes Yes
SHOUTCast Yes Yes No Yes
RTMP Yes Yes No No (requires Flash)

An Evolving Spec (Or, “Whoa, this thing is moving!”)

As the audio spec is still evolving, there are a couple of inconsistencies in browser implementations to watch out for. These generally affect older browsers.

The load Method

Until fairly recently, the load method was required in order to tell the browser about a change in the media source and to get it to update appropriately. Now, media.load() causes the element to reset and to start selecting and loading a new media resource from scratch.

So for older browsers, such as Firefox 3.6, in order to change the media source, you’ll not only need to change the source value but also issue a load command:

var audio = new Audio();
audio.setAttribute("src","mynew.mp3");
audio.load(); // required for 'older' browsers

When Browsers Go Off-Spec

When creating cross-browser solutions, it’s useful to know which browsers follow the W3C spec to what extent and how they deviate.

Autoplay and Volume

iOS and Blackberry devices ignore the autoplay attribute and media element volume changes, whereas Android supports autoplay but not volume changes.

Effectively, on Blackberry and iOS devices, autoplay functionality is disabled as both require a user-initiated event to “kick it off”. This has the unfortunate side-effect that any audio suffers a delay between user interaction and playback.

To mitigate against these issues, you may want to take a look at Doctor Remy Sharp’s excellent article on Audio Sprites and fixes for iOS.

Simultaneous Playback of Multiple Audio Elements

A particular annoyance for developers of anything but the most simple audio web apps: not all browsers will play multiple audio elements simultaneously. This is a particular problem for games developers. Again, we see this issue with iOS and Blackberry devices.

Operating System Dependence

Safari (5+) relies on Quicktime being installed. This is rarely a problem unless you are running it on Windows.

Internet Explorer (9+) relies on the codecs being present at the operating system level. As it only runs on Windows, this is fortunately almost always the case.

What’s New?

There are a few new features and whole new APIs being specified for web-based audio, so let’s take a look at what’s around the corner.

A Change of Pace

A couple of noteworthy upcoming features are playbackRate and defaultPlaybackRate. As you can probably imagine, these fellas let us alter the speed and direction of playback. This functionality could be used for fast-forward and rewind functions or perhaps to allow users to tweak the playback speed so they can fit more podcasts into their day.

  • audio.playbackRate returns 1 at normal speed and acts as a multiple that is applied to the rate of playback. For example, setting playbackRate to 2 would double the speed, while setting it to -1 would play the media backwards.
  • audio.defaultPlaybackRate is the rate at which the audio will play after you pause and restart the media (or issue any event for that matter).

Media Fragments

Currently, if we wish to reference some part of a media file, we often first have to download at least some of the media we don't actually need. The W3C Media Fragments proposal was created to address this issue and others, such as the retrieval of an area of video or just the associated text track. In the case of audio, it will allow us to specify which parts we want to download using parameters on the source URI.

Advanced Audio APIs: The Future Sound of Browsers

When it comes to more advanced audio functionality, Mozilla was first to enter the fray with an early experimental Audio Data API for Firefox. This was intended to facilitate audio manipulation via JavaScript and created a platform for low-level tinkering.

Not to be outdone, Google released the Web Audio API for Chrome and put it forward as a W3C proposal. The Web Audio API is a higher-level implementation that takes an audio-node based approach and provides many useful functions.

Both implementations address the desire by developers to do more with audio — to create, manipulate, analyse, and mix audio on the fly. In effect, we may one day be able to do in a browser anything we can currently do with native applications.

So what if we want to play with advanced audio in the browser? Well, two separate implementations currently exist. If you want to write applications that use both APIs in any reasonable way, you need third-party bridging libraries to abstract away the differences.

Also fairly new on the scene is Mozilla's MediaStream Processing API, which takes another approach to audio and video streams to allow us to manipulate at both high and low levels.

In fact, using advanced APIs and taking advantage of the speed of modern JavaScript engines, we can even write decoders for unsupported codecs. Currently, we have the JSMad MP3 decoder and the lossless Alac.js decoder. Apparently, FLAC and AAC are in the pipeline. There is hope, then, that in the future, we may not have to worry about which browsers are supporting which formats.

For those curious about advanced web audio, I wrote about the differences in approaches and the different libraries that allow us to write cross-browser solutions in HTML5 Audio APIs: How Low can we Go?

The good news is that all relevant parties are talking, and things seem to be buzzing on the W3C Audio Working Group. It looks like everyone is coming together to put their ideas into a unified Audio Processing API, which is rapidly approaching publication as a Web Standard.

Summary

Although browser implementations of the current HTML5 audio spec are improving, there are still a few issues to watch out for when creating comprehensive cross-browser solutions. You certainly need to be aware of browser limitations on platforms like iOS. Hopefully, as support matures, these issues will disappear.

Positive news on the "advanced audio" front too: a new standard is being established, and we're close to get a unified spec for browser makers to work from. In the meantime, JavaScript libraries are available to help bridge the gaps between existing implementations.

There are also signs that we may be seeing the end of the requirement for different browser-dependent codecs. Opera Mobile already supports MP3 where the underlying OS supports it, and Mozilla look like they may enable this too. If this option exists for mobile browsers, it's not a huge stretch to imagine that desktop browsers will follow suit.

So lots happening and lots already achieved. The rough edges of existing browser implementations are being smoothed, consensus and standards are being forged, and we can see the foundations of some very exciting new technologies being established.

The future of web-based audio looks bright!

27 Responses on the article “HTML5 Audio — The State of Play”

Claudio Poli says

Awesome article, JavaScript decoders can be a nice alternative while we wait for browser makers to get an agreement on something ;)

Otto Nascarella says

Very good article.
Cheers!

Martin Pagh Ludvigsen says

This article is a must read for creative developers. Thorough and very well written. Thanks a lot.

Cayzland Studio says

Note, that our problem-child Internet Explorer strikes, if you put the parameter “preload=’none’” to the audio tag.

For example: You will provide more then 10 audios on one page. And if a user has a slow internet connection, a full preview downloading would kill nearly all. It would take too much time.

So, the best thing to do is set the preload to none. And of course we can’t do that that simple way, because – like ever – Microsoft still knows better what the users and delevopers want… and so the IE will just do nothing if the parameter is set. The player keeps black and without any controls…

So we have to put the preload to “metadata”. For me this is the most usable solution. Try and tell me what you think!

LukaszC says

I wasn’t able to play audio using JavaScript in IE9 and Safari (both versions in Windows 7). Neither IE9 nor Safari for Windows support the canplayType method, and it’s always safe to wrap it in a “try” function and use an alternative audio player (such us SoundManager2) when an error is detected.

Mundstrom says

good read. Amazed that MP3 still isn’t supported across the board… I mean it’s not like anybody is in doubt about how widespread it is.

Mark Boas says

@Cayzland Studio Yep you’re absolutely right. The preload=”none” attribute on IE9 results in the web native player not being rendered. (Example http://jsfiddle.net/Df8yC/)

Via JavaScript you could issue the pause() command on each audio element but I suspect this has the effect of loading at least the meta-data. (Example http://jsfiddle.net/tzjY2/)

@LucaszC the canPlayType method does work for me in IE9 on Win 7 (Example http://jsfiddle.net/PSWUn/) It may however depend on your installation as I’ve heard of issues with Win 2008 server and just weirdly broken installations of IE9. As far as Safari on Win 7 – it is failing for me too, probably something to do with QuickTime not being installed (which Safari relies on to play media).

I think the idea to wrap everything in a try catch for maximum robustness is a good one, in fact that’s what we do with jPlayer.

Wojciech Fornal says

I’ve spotted a little typo (so far) at the beginning of the article:

var audio = document.getElementByTagName(‘audio’)[index];

should be:

var audio = document.getElementsByTagName(‘audio’)[index];

zikzak says

Nothing about Opus ? Firefox can play it and the audio quality is outstanding even with very low bitrate !

Kumar says

BTW, some servers dont support most audio formats, isn’t it? so, what do we do in that case?

Cayzland Studio says

Kumar,

you can add the MIME types in the htaccess to make them run on a server.

Add these lines to the htaccess (expample)

AddType audio/mpeg mp3
AddType audio/mp4 m4a

Restart the apache (service) and the MIME types will work

Kumar says

Thanks Cayzland. I added the formats required for my website in the format you have given. It works fine now.

TagLoomis says

It’s unfortunate that neither Icecast or Shoutcast metadata is supported.

Sure would be nice to know what a live stream is playing.

iPhone sends; icy-metadata: 1 in the audio request and both Icecast and Shoutcast return icy-metaint: %d.

But there doesn’t seem to be any methods for getting the Metadata.

Asheesh says

I must say that this is one of the most helpful articles on audio tags…hell, I will go ahead and remove “one of the” part..It was “the” most helpful articles on audio tags…Its comprehensive yet concise..Easy to understand and similarly effective. Thanks to you that my project (I have to give BG music and SFX effects in my office’s cultural fest) is complete, and I can sleep peacefully now, instead of searching each and everything mercilessly on Google :)

Anthony says

I use .oog as it’s more compatible in my view. Mp3 is very ropey with html5.

Rodrigo says

“iOS and Blackberry devices ignore the autoplay attribute and media element volume changes”

Wow, this was really useful to know. I was wondering why my “mute” button didn’t work on iPad. I guess I’ll have to pause the audio then. Thanks for that info!

Phil says

This is easily the most informative yet brief and to the point article on the topic I’ve found. Adding the MIME types to the htaccess is surely going to be good to know.

Mark says

I currently have the following working for streaming:

Sometimes however it takes up to 10 seconds to play initially because it’s buffering.

How can I have an animated gif icon on my webpage that ‘s displayed while the stream is buffering then disappears once the stream is actually playing?

Mark says

I currently have the following working for streaming (pipes swapped for html tags):

|audio autoplay=”autoplay”|
|source src=”http://streaming203.radionomy.com/Air-Jazz?group=0&countrycode=US” type=”audio/mp3″|
|/audio|

Sometimes however it takes up to 10 seconds to play initially because it’s buffering.

How can I have an animated gif icon on my webpage that ‘s displayed while the stream is buffering then disappears once the stream is actually playing?

Mark says

Follow up to my post above, I was able to use the onloadstart event to listening for the start of buffering. However the oncanplay has been also firing at the same time. I was hoping it would fire when the music started playing so I could change the icon.

Sam Wilson says

Thanks for the helpful article. I think the only thing missing is some information about the fallback solution and how that works. At the very least, a link to another, similarly helpful, resource would help round out the article.

windylcx says

wow,what a nice article!

guowei says

my server is tomcat and audio tag can’t work
can you help me?

dj wesele bydgoszcz says

Hi there, i read your blog from time to time and i own a similar one and
i was just curious if you get a lot of spam comments?
If so how do you reduce it, any plugin or anything you can
advise? I get so much lately it’s driving me crazy so any support is very much appreciated.

Take a look at my web page; dj wesele bydgoszcz

Arindam Mojumder says

My loading calculation is not working on Chrome.

if ((theAudio.buffered != undefined) && (theAudio.buffered.length != 0))
{
var currentBuffer = theAudio.buffered.end(0);
var maxduration = theAudio.duration;
var perc = 100 * (currentBuffer / maxduration);
$(“#ap-”+ params.id).find(‘.songloadbar’).css({ width: perc + “%” });
}

var currentBuffer = theAudio.buffered.end(0); This is showing error in Chrome only. The other browser is working fine, even the IE9.

Kindly tell me how can I show / calculate the audion loaded part?

Luca Baz00ka says

I’ need to create a simple app with an tag.
The objective is that the user either on mouse over or using the access key gets the specific sound. For example for a “A” one presses “a” on the keyboard and the sound “a” is played.

Here is the code.

I’ve tested it on Google Chrome, Firefox, BB9000 emulator and also on a real BB9000 and got consistent results:

1) on all those devices I see the UI and if I click on the “play” it works.

2) neither on BB9000 simulator nor on the “real thing” i can see the content of the title tag

3) on none of those environments the accesskey works.

Any advice is appreciated.

Daniel says

I need some help.. I have an audio file .wav in adpcm (8khz) format ,how can i play it from the browser with the audio tag?

Join the discussion.

Some HTML is ok

You can use these tags:
<a href="" title="">
<abbr title="">
<b>
<blockquote cite="">
<cite>
<del datetime="">
<em>
<i>
<q cite="">
<strong>

You can also use <code>, and remember to use &lt; and &gt; for brackets.