It's Curtains for Marital Strife Thanks to getUserMedia

by .

True story: I was tasked by the lovely Mrs Lawson to buy some curtains that match our carpet during the January sales. I dutifully did so — and had to return to the shop straight away because they didn’t match at all. Mrs Lawson accompanied me, and with a withering glance at her incompetent mate, immediately found some correctly hued fabric, and all was well.

But what’s a middle-aged colour-blind bloke to do? I had early in the curtain procurement process decided against cutting a hole in the carpet in order that I may take a sample with me. (All other mistakes aside, this was a correct decision.)

So, in order to ensure that I would never again repeat the mistake, I set out to make an app that would allow me to capture the colour of an image straight from my camera. Of course, it had to be a web app rather than a native app, because we’re web angels, not proprietary devils.

getUserMedia

getUserMedia is an API that gives a web page access to a user’s camera and microphone via JavaScript. It’s supported in Opera 12 and Opera Mobile 12 for Android, and WebKit in Chrome Canary builds (instructions).

Like many other APIs, it’s not part of the “real” HTML5 spec. It started life as the HTML5 <device> element, then got moved into the W3C as part of the webRTC specifications. But let’s not taxonomise when we could be having fun.

The first thing we need to do when using super-cool new UIs is detect whether it’s supported:

 if (navigator.getUserMedia) {
  // do something cool
} else {
  // fallback code
}
Basic usage of getUserMedia

For this article, our “do something cool” will be to grab the dominant colour of the current input stream from the camera. As a fallback, we have several options.

One option, if we’re using the default Android browser, is to show a form that automatically starts the camera when focussed:

<form enctype="multipart/form-data" method="post">
  <input type="file" accept="video/*;capture=camera" />
</form>
Fallback code for Android

There’s also the capture attribute, a proposed extension from the W3C Device API Media Capture API that tells a user agent where to get the input data. Android extends the accept attribute. See more in David Calhoun’s Device API article. Alternative capture arguments are camcorder (for video), microphone, and filesystem.

Browsers that don’t understand the still-draft Media Capture API will simply ignore the capture attrbute and prompt the user to browse to a file on their file system for uploading.

Users on iOS are out of luck, as their devices don’t allow access to the file system.

Differences Between the Media Capture API and getUserMedia

Robin Berjon, co-chair of the Device API (DAP) Working Group explains: The markup-only Media Capture API is a simpler, low-hanging-fruit version that can work easily for a lot of use cases. It also has a very simple security model (just a file upload).

In fact, it works exactly like <input type=file>, except that it knows to allow the user to grab a pic from the camera by default instead of hitting the filesystem. But just like with <input type=file>, once a picture has been taken (i.e., a file selected), you can access it and read from it using the File API, which means you can stuff it in an <img> or a <canvas> and do nasty things to it without annoying any faraway server.

But of course, if it’s part of a form and you submit it, it gets submitted just like a regular file upload.

The thing is, though, it’s always a snapshot. Even if you use it to take a video (which you can), the page only gets it after you’ve finished shooting. With getUserMedia (gUM), you get the live stream, which you can modify, record, or stream elsewhere. The gUM approach is more powerful but more complex, and it has more severe security issues.

getUserMedia API

Let’s assume (because we’re optimists) that your user has a getUserMedia-enabled device. The API is nice and simple, as all Web APIs should be.

navigator.getUserMedia accepts two required arguments and an optional third.

The first argument tells the device which media you require access to, and it’s passed as a JavaScript object. So, if you only require access to the microphone, the first argument would be {audio: true}; for video-chatting, you would use {audio: true, video: true}.

The device decides which camera to use: User agents are encouraged to default to using the user’s primary or system default camera and/or microphone (as appropriate) to generate the media stream.

A previous version of the specification allowed hints to user agents about which camera to use. The API could specify “user” (the front camera on a phone) or “environment” (the rear camera on a phone). Debate continues about whether to reinstate this (read the ongoing conversation). One argument against is that letting sites know what cameras a device has could help Dr Malice of Evil Corp. fingerprint users after luring them to his site (which I find to be overly-cautious). Harald Alvestrand poses a conundrum: conferencing units may have a room camera, a document camera and a lectern camera, neither of which is attached to the physical box; which one of these is front?

But let’s not worry at the moment. Devices with more than one camera generally have a mechanism that allow the user to choose which one is used, so this gives the user control. Note also that the spec says User agents may allow users to use any media source, including pre-recorded media files.

The second argument is the success callback, the function to be executed on success, assuming that the user allows access and the device supports your request.

There is an optional third argument. This is the failure callback, the function to be executed if something went wrong. It’s optional, but only optional in the sense that washing your hands before you eat is optional: if you don’t do it, you leave yourself open to all sorts of bugs and your mum will shout at you.

Of course, it’s important to do feature detection before making an API call to ensure that the browser and device are capable, but even if feature detection succeeds, there is still much that can cause failure. For example, the user could deny permission or turn off the camera via a hardware switch, or the device itself could disable the camera.

So here’s how we recommend you use the API:

navigator.getUserMedia({audio: true, video: true}, success, error);

function success(stream) { 
  // ... use 'stream' ... 
} 

function error(){ 
  // display fallback content 
}
Using the gUM API

To check whether it’s working correctly, we can attach the output of the camera’s stream to the input of an HTML5 video on the page, like this:

<video autoplay></video>
<script>
if (navigator.getUserMedia) {
  navigator.getUserMedia({audio: true, video: true}, success, error);
  function success(stream)
  { 
    // ... use 'stream' ...
    var video = document.getElementsByTagName('video')[];
    video.src = stream; 
  }
  // ...
</script>
Streaming captured video to an HTML5 <video> element

Differences Between Webkit and Opera

The two rendering engines that implement getUserMedia do so in two different ways. Hopefully, they’ll harmonise when the features reach the released browsers.

The differences:

  • WebKit uses a prefix — navigator.webkitGetUserMedia — while Opera doesn’t.
  • WebKit implements the options according to an old version of the spec, in which audio and video were passed as strings:
    navigator.webkitGetUserMedia("video, audio", /* ... */)
    whereas Opera implements the most recent version of the specification which uses JavaScript objects:
    navigator.getUserMedia({video: true, audio: true}, /* ... */)
    (Added June 2012: Chrome 20+ uses the ({video: true, audio: true} syntax.)
  • WebKit attaches the resulting stream like this:
    video.src = window.webkitURL.createObjectURL(stream);
    which is according to the current version of the spec. Opera uses:
    video.src = stream;
    and has proposed simplifying the spec (but discussion continues).

But this fragmentation, even though it’s only in experimental browser releases is, like, a total drag, man.

Introducing The gUM Shield

Fortunately, we’ve got your back. Two of the finest minds I know — actually, two of the finest Mikes, Doctor Mike Robinson and Opera’s Mike Taylor — have created a snippet of script that you can copy to shield you from these annoying differences. We’ve named this getUserMedia syntax normalisation script snippet The gUM Shield.

It assumes that there’s a variable called video to which you’ve assigned a reference to the <video> element you’ll stream to. Do this with some mechanism like:

video = document.getElementById('video');

Simply hit the ‘Hub, download it, plug it into your script, and check it out!

Note that it’s experimental. Who knows what’ll happen when Mozilla, Microsoft, and Apple implement it, but please, have a play. Fork it and improve it!

Doing Exciting Things with the Video Stream

So far, we’re only echoing the video stream. But, as you’ll know from Tab Atkins’ guest article video + canvas = magic, if we copy the video into a canvas, we can manipulate it:

var video = document.querySelector('video');
var canvas = document.querySelector('canvas');
var ctx = canvas.getContext('2d');

ctx.drawImage(video, 0, 0, video.width, video.height,
  0, 0, canvas.width, canvas.height);
Canvas and video magic

A common mistake is assuming that this somehow establishes a “live connection” between the video stream and the canvas. It doesn’t. It only grabs the current frame. So you need to grab the current frame repeatedly in order to replicate a video. The magical number is 15 times a second (or every 67 milliseconds), but a slower frame rate would probably be okay for this demo:

setInterval(function () { 
  ctx.drawImage(video, 0, 0, video.width, video.height,
  0, 0, canvas.width, canvas.height);
},  67);
Live updating of <canvas> from a <video>

See the working demo.

What I want to do now is to grab the dominant colour from the canvas. I’m stealing a demo by Shwetank Dixit (who borrowed from Lokesh Dhakar‘s Color Thief demo). We won’t look at how it determines the dominant colour (because that’s nothing to do with getUserMedia).

Try this in Opera Labs and Chrome Canary!

Unfortunately, this works splendidly in Opera but it occasionally fails in Chrome, and we’re not sure why. (We’ve emailed the Chrome team to ask them.) Apart from the rather anti-climactic end to an article, you shouldn’t be dismayed by this. getUserMedia is only available in experimental builds that have plenty of work to be done before they hit production.

Back to our demo. Once you’ve got the dominant colour, you could prompt the user for a name for that colour (“carpet”, for example) and store that in local storage using its name as a key. Then, when you find a pair of curtains in the shop, you can find their dominant colour and easily compare it with that stored against “carpet” from earlier.

And you can return home, confident of delighting the love of your life with your chromatically appropriate choice of soft furnishings. That’s HTML5: enriching marriages since 2004.

woman says 'hey daddy-o, the curtains really razz my berries, ya dig?'. Man replies 'Baby, I totally dig. Early night?'. A thought bubble shows him thinking 'Thank you, getUserMedia'

Harshing Your Mellow

Things need refining before web apps with getUserMedia are on a par with native applications. At the moment, there is no way of controlling which camera is to be used (and no agreement that a developer should be able to). The camera’s flash mechanism isn’t programmatically controllable, which could harm the user experience. Most importantly are the privacy implications of web pages being able to access your camera. Opera has an experimental privacy UI, but Chrome has no UI yet.

More Demos

See Paul Neave’s HTML5 Webcam Toy and the links at the foot of my Opera Labs article.

Thanks

Thank for help and code are due to Robin Berjon, Shwetank Dixit, Daniel Davis, Lokesh Dhakar, Mike Taylor, Mike Robinson, and .

Addy Osmani’s Polyfill

Added 13 March 2012: Addy Osmani has published a getUserMedia polyfill called getUserMedia.js, which uses Flash if gUM isn’t available. It’s not ready for production yet, as IE is still being tested. Forkers may hit the ‘hub.

28 Responses on the article “It's Curtains for Marital Strife Thanks to getUserMedia”

  • Fabrice says:

    Pretty cool!

  • Matt Machell says:

    With the demo, would using requestAnimationFrame be more suitable/performant for updating the video stream than a simple setInterval?

    For those unaware of it, there’s a good article on it over here: http://paulirish.com/2011/requestanimationframe-for-smart-animating/

  • Federico says:

    Really cool… worth playing with

  • Will be nice once this is supported on Webkit/Safari on iOS devices. I have a web app I want users to be able to upload videos directly from their iPhone and if this worked, that would be fantastic.

  • Cody says:

    Thx for the great article Bruce, it helps a ton!… and ur HTML5 book… and its second addition!

  • Steve says:

    If you were going for 15 fps your SetInterval() call should use 1000/15.

  • Sam Dutton says:

    Great article!

    Just to say – Chrome Canary now uses the ({video: true, audio: true} syntax.

  • Sam Dutton says:

    More specifically: Chrome 20+ uses the ({video: true, audio: true} syntax.

  • bruce says:

    @Steve said “If you were going for 15 fps your SetInterval() call should use 1000/15”

    Ah yeah, typo – fixed!

    @sam Dutton – thanks; noted in the article

  • Luís says:

    Any thoughts on how can I get the streaming from the server side?

  • bruce says:

    Luís , if you want to stream video from a server to a browser, just point a video element at a streaming source. Eg, http://www.alobbs.com/1386/Streaming_WebM_VP8_One_Day_Later.html

  • Carlos says:

    14/12/2008: the demo “http://jsbin.com/avaxaq/1/edit” works fine on Opera, Opera-Mobile (on tablets, but not on Motorola RAZR, where Opera Mobile crashes) and does’nt work at all on Chrome, Chrome Beta, Firefox, Safari ( all in latest versions ), any suggestion ?

  • Carlos says:

    Forgot : my RAZR runs Android 4.04, Tablet on Android 4.01, all others on Windows XP PRO, latest updates , and Windows 7.

  • Sam Dutton says:

    Hi Carlos — try the simple example at simpl.info/gum.

    This works on Chrome, Firefox and Opera Next.

  • Carlos says:

    @Sam: thanks for your suggestion, I tried a lot, changing some parameters too, but none worked on my smartphone, Opera mobile, Chrome, Firefox. On windows, no pb, on tablet (android 4) it’s OK too. Could be that the duo camera of the RAZR (front and back) is the pb ?

  • Bruce Lawson says:

    Carlos, it sounds like the way the phone is configured. Sorry!

  • Chasity says:

    Hi, Neat post. There’s an issue along with your site in internet explorer, could test this? IE still is the market leader and a big component to people will pass over your excellent writing because of this problem.

  • David says:

    Hi

    Very nice post and it’s really nice to see that you guys are making the effort to get this as painless as possible. But I personally believe that there is still a long road ahead in the topic of webcam (or native hardware for that matter) interaction using HTML, so I wouldn’t go for it for production. But I would really like to see it becoming the standard because depending on Silverlight or Flash just for webcams is not a very good idea.

    Thanks again for the wonderful post!

  • Hello guys

    getUserMedia is totally cool. I just built an amazing prototype where you can send videos directly recorded from your browser, this without Flash. This in 100% JavaScript thanks to getUserMedia.

    See it yourself:
    https://www.videomail.io

    Obviously it works for Firefox and Chrome only for now. But it’s a wonderful example showing off what we can do with getUserMedia :)

    Cheers
    Michael

  • David says:

    How can I set a callback on when a new frame appears in the video stream? All examples use requestAnimationFrame but that event happens when the screen redraws and not when a frame has arrived. Monitor frames appear at typically 60 fps and cameras at 30 fps. It seems like the examples read the video twice too often and get the same frame twice.

  • Bruce Lawson says:

    David,

    good question. I have no idea.

  • pastis says:

    Hi,
    thanks for the post,
    i am quite new to HTML5 and JavaScript;

    Any hint on how can i send the live stream of the webcam(or maybe just the audio) back to the server that supplied the web page?

    i wrote a very simple http server in java, it also runs on android devices.

    when some user connects to that ip, the browser shows a simple web page;

    i would like the user to be able to send his microphone(and maybe the webcam) live to the server side,
    so the ” // … use ‘stream’ … ” part for me should be
    to broadcast the content back to the server, and not showing it to the user.

    any hint on how i can achieve that?

  • YoungNG says:

    How to stop the video and release the camera hardware to use it with another application?

  • Sam Dutton says:

    @YoungNG: you can call stop() on the stream, and you can simultaneously access a camera from gUM in multiple tabs/browsers: you don’t need to to ‘release’ the camera. However, you can’t have different constraints (such as resolution or camera selection) for gUM running in different tabs.

  • ramasamy says:

    Hi Bruce,

    I m developing mobile web application which will run in android and iphone.now i m using the getUserMedia(navigator.webkitGetUserMedia for chrome browser) to capture images.It is working fine in chrome for android.But what is the thing is if two side cameras(front and back) are there.Then defaultly camera is selected as front camera.But we want to use back camera.Any alternative is there for this issue.Please suggest some solution for this.

  • Bruce Lawson says:

    ramasamy gUM uses the default camera. Personally, I don’t believe a developer has the right to change the default.

  • nero chen says:

    Why mobile browser does not support getUserMedia interfaces? You know you can run it on the phone live demo?

  • King Wilder says:

    Any examples for video and audio on how to add “Record” and “Stop” buttons, and then on ‘stop’, save the recording to the server in an ASP.NET MVC 5 application. I mostly need the script code, not the MVC code.

    Am I to also understand that in Firefox the audio and video are saved as separate files? If so, how can I play them back in sync, or merge the two programmatically?

  • Join the discussion.

    Some HTML is ok

    You can use these tags:
    <a href="" title="">
    <abbr title="">
    <b>
    <blockquote cite="">
    <cite>
    <del datetime="">
    <em>
    <i>
    <q cite="">
    <strong>

    You can also use <code>, and remember to use &lt; and &gt; for brackets.