Video Subtitling and WebVTT

by .

We’ve been able to play video in the browser without a plugin for a couple of years now, and whilst there are still some codec annoyances, things appear to have settled down on the video front. The next step is adding resources to the video to make it more accessible and provide more options to the viewer.

We currently have no means to provide information about what’s happening or being said in the video, which means the video isn’t very accessible and the user can’t easily navigate to a particular section of the video. Thankfully, there’s a new format specification in the works called WebVTT (Web Video Text Tracks). As of now, it’s only in the WHATWG spec, but the recently established W3C Web Media Text Tracks Community Group should introduce a WebVTT spec to the W3C soon.

You may recall a similar format called WebSRT (Web Subtitle Resource Tracks) that was recently under discussion. WebSRT was renamed to, and has been replaced by, WebVTT.

A WebVTT (.vtt) file is simply plain text containing several types of information about the video:

Subtitles
The transcription or translation of the dialogue.
Captions
Similar to subtitles, but may also include sound effects and other audio information.
Descriptions
Intended to be a separate text file that describes the video through a screenreader.
Chapters
Intended to help the user navigate through the video.
Metadata
Information and content about the video, which isn’t intended to be displayed to the viewer by default, though you may wish to do so using JavaScript.

This article will mostly be talking about subtitles and captions, but it will briefly touch on chapters too.

Beyond the scope of this article but worth mentioning is the text track API, which, amongst other things, denotes how many text tracks there are and which ones have loaded and are ready for use. If you have used this API, let us know.

How to Make and Link to a WebVTT file

All you need to make a WebVTT file is a simple text editor. Type WEBVTT as the first line of the file and save it as a .vtt file. In the future, we expect existing captioning tools such as Universal Subtitles to export to WebVTT format.

WEBVTT
The simplest possible valid WebVTT file

That’s all you need to get started. Next, we have to link to the file in the HTML document. We do this via the <track> element, which is a child of the video element. The <track> element has several optional attributes:

  • the source WebVTT file (src),
  • the language of the track (srclang),
  • a user-readable label, and
  • what kind of track it is. The values of the kind attribute come from the list above (i.e., subtitles, captions, etc.).

In the following example, we’re using a <track> element for subtitles:

<video width="640" height="480" controls>
  <source src="video.mp4" type="video/mp4" />
  <source src="video.webm" type="video/webm" />
  <track src="subtitles.vtt" kind="subtitles" srclang="en" label="English" />
  <!-- fallback for rubbish browsers -->
</video>

A few notes about the attributes:

  • If no kind is specified, the default is subtitles.
  • If the kind is subtitles, then srclang is required.
  • There should not be two tracks of the same kind with the same label.

In the above example, we use a <video> element with two different <src> elements (for cross-browser loveliness). After the sources comes the <track> element. You can have several <track> elements as you might have subtitles, captions, and descriptions all in different languages.

<track> doesn’t presuppose a particular file format. the TTML format, but this format will not be implemented by other browser vendors.

WebVTT Contents

We now know how to make a WebVTT file and how to reference it in an HTML document, but what goes inside it? Within the file, we list what are known as “cues”. The WebVTT file might only have one cue, but it can contain as many as you want. Each cue starts with an ID, followed by the time settings, followed by the text. Each cue is separated by a blank line. Here’s an example of captions:

WEBVTT

1
00:00:01.000 --> 00:00:10.000
This is the first line of text, displaying from 1-10 seconds

2
00:00:15.000 --> 00:00:20.000
And the second line of text
separated over two lines
WebVTT example content

The above example has two cues. Times must be written in hh:mm:ss.mmm format, so the timings in this example occur in the first twenty seconds. The second cue will split the text over two lines automatically.

If you have a segment of text that needs to appear in a karaoke/paint-on caption style, then you can place timers inline with text:

1
00:00:01.000 --> 00:00:10.000
Never gonna give you up <00:00:01.000> Never gonna let you down <00:00:05.000> Never gonna run around and desert you
Karaoke-style captioning

Styling Options

The previous examples specify the minimum configuration you need for subtitling and captioning, but you can style your captions too. Let’s start with the cue settings, which are done on the same line as the time settings:

D:vertical / D:vertical-lr
Display the text vertically rather than horizontally. This also specifies whether the text grows to the left (vertical) or to the right (vertical-lr).
L:X / L:X%
Either a number or a percentage. If a percentage, then it is the position from the top of the frame. If a number, this represents what line number it will be.
T:X%
The position of the text horizontally on the video. T:100% would place the text on the right side of the video.
A:start / A:middle / A:end
The alignment of the text within its box – start is left-aligned, middle is centre-aligned, and end is right-aligned.
S:X%
The width of the text box as a percentage of the video width.

To make use of these settings, put them alongside the time settings, like this:

00:00:01.000 --> 00:00:10.000 A:middle T:50%
00:00:01.000 --> 00:00:10.000 A:end D:vertical
00:00:01.000 --> 00:00:10.000 A:start T:100% L:0%

which would result in something like the following:

Examples of subtitle display
Examples of subtitle display and alignment

Along with the above cue settings, you can use inline styles for text:

Bold text
<b>Lorem ipsum</b>
Italic text
<i>dolor sit amet</i>
Underlined text
<u>consectetuer adipiscing</u>
Ruby text
<ruby>見<rt>み</rt></ruby>

You can even apply a CSS class to a section of text using <c.myClass>Lorem ipsum</c>, giving us many more styling options.

Finally, you can add a declaration representing the name of the voice: <v Tom>Hello world</v>. This declaration accomplishes three things:

  1. The caption will display the voice (Tom) in addition to the caption text.
  2. The name of the voice can be read by a screenreader, possibly event using a different voice for male or female names.
  3. It offers a hook for styling so that, for example, all captions for Tom could be in blue.

Chapters

You can provide a chapter list for the video the same way you would provide subtitles or captions. Start with the same WEBVTT declaration, and then for each cue, declare the chapter number, the start and stop times, and the chapter title:

<track src="chapters.vtt" kind="chapters" srclang="en" />
HTML <track> element for providing chapters to a video
WEBVTT

Chapter 1
00:00:01.000 --> 00:00:10.000>
Introduction to HTML5
WebVTT file containing video chapter markers

Browser Support

One slight glitch with WebVTT: not a single browser currently supports it. All major browsers have started working on implementations, so we should see some results soon. Thankfully, in the meantime, there are several JavaScript polyfills available:

Demo

We’ve put together a quick demo which uses the Playr polyfill. We started using MediaElementJS, but it doesn’t sport as many features as Playr, such as separate lines of text and CSS classes. In the demo, the subtitles start at 2 seconds and 15 seconds and use bold, underline, and custom styles. Here’s the associated WebVTT file.

Conclusion

This article covers the basics of creating a WebVTT file suitable for subtitles or captions for a video. We know how to add cues and chapters and how to add styles and change how the text appears on the video. Although no browser yet supports it, there’s a lot more to come for accessible video, so stay tuned to the W3C Web Media Text Tracks Community Group.

What are your thoughts on WebVTT? Are any of you using it now? How can it be improved?

Finally, let’s thank @silviapfeiffer for taking the time to answer some questions about WebVTT and for her tremendous work in this field.

Reading

21 Responses on the article “Video Subtitling and WebVTT”

  • Marcia says:

    Shouldn’t the A:end subtitle text in the middle image be right-aligned? And what does “line number” refer to, pixels from top?

  • Thanks Marcia, that image is actually incorrect. Whoops :(
    I’ll get a new one uploaded…

  • New image uploaded.

    Line number, if not a percentage (from the top of the video) then it is based on the size of the first line of the cue.

  • Ronny says:

    Hi Tom,

    to make your picture examples little bit more correct you should add the cue text lines to the cue time line + settings examples since the second picture shows a cue of two text lines while the other pictures only shows a cue of one text line.

  • I had been looking for out of the box .SRT (format defined on youtube to upload caption and provide accessible content to the user on flash videos, more info at http://www.google.com/support/youtube/bin/answer.py?answer=100077) functionality earlier, got to see finally implemented in HTML5 Video.

    That’s cool and looks like promising to see the same step taken.

    One more reason to go for HTML5 video :)

  • Patrick Hall says:

    I find myself wondering: why do we need a whole separate language for marking up these kinds of annotation? Couldn’t it be done with a <track> tag or a microformat (something like <p class=line data-start=00:00:00 data-stop=00:00:03>a line like this<p>) or something to that effect?

    I’m not trying to be contrary or complain, WebVTT looks very useful and I’m sure there are good reasons for doing it this way… I’m just wondering what those are.

  • foobar says:

    IE10 PP4 supports <track> with both WebVTT and TTML.

  • cees says:

    i was trying to get lean back player to work , to show vtt file , i tried verything could not get it done
    After a while i tried captioneter en payr those worked so there must be a bigger probem in leanplayer to get i to work
    pitty looks a nice player

    cees

  • For you chapter example it looks like there’s an extra greater than sign at the end. Am I correct or does WebVTT require this for chapters?

  • The styling info is helpful but how do you change the style of all the subtitles without adding a class to every one? Like, make them all bigger? I tried putting styles on the track element but that didn’t work.

  • I just put HTML5 video with subtitles into production. Took much longer than I thought as being still rather cutting edge, there are various pitfalls along the way. Check out the review of subtitle support in the major HTML5 video players I just wrote up to save you some hassle with the players/polyfills listed here and others.

  • Hi,
    I made a simple script that converts .SRT or .ASS subtitles to .VTT subtitles.
    If anyone’s interested, here it is:
    http://rikudou.naruto-sekai.com/subtitles/ass-to-vtt.php
    http://rikudou.naruto-sekai.com/subtitles/srt-to-vtt.php

  • Martin Kirk says:

    I can’t make VTT subtitles to work

    i’ve made a simple html5 page embedding a MP4 movie along with VTT Subtitles according to specifications.

    Chrome shows that there are captions to the movie, but they don’t show ? – All samples on the internet seems to use Playr.js to mak it work… i thought it would work natively ?

    I would prefer if embedded SRT subs would just work like in almost all windows MP4 players … but that would be too much to ask for :-(

  • sunil says:

    I am trying to test htt live streaming with webVTT subtittle. So I just wnated to ask, do you have any test hls url with webvtt subtittle?
    I tried to google for the same link/url, but unable to get it.

    Regards,
    Sunil

  • Davy Brouwer says:

    How can I diacritics written such as ä ä, ë ë, üü, é é etc. … use in vvt/vtt files and how the ß ß.

  • adrian says:

    Hi
    I’m wondering if i can convert XML files to VTT, if not then hope can i convert SRT files to VTT
    thanks

  • Annie says:

    Please help -Syncing plain text file with video file

    I am desperately looking for a solution where I can display a plain text file along with a video file in such a way that the single playback control (of video) also makes the text file scroll – so both files scroll with a single player control. My text file is plain text file. I dont want both files to be accurately synced so long as they both end roughly end at the same time. I would appreciate any help with this or some example code as I am new to programming.
    Kind Regards
    Annie

    • mike says:

      The cuechange event is available on all browsers now.

      You could in theory listen for that event on the track, then get the track.activeCues[0].text and use that string to move forward in your text file.

  • beausejour says:

    I’m looking for a feature that should obviously (to me at least) exist, but that I just can’t find.

    With video files we can add a *.srt file which displays specified text at specified times. I have a working knowledge of SRT files.

    It seems to me that there must something equivalent to display specified images at specified times during the playing of an audio file. It could be thought of as a kind of visualization. I would imagine a simple text file that would look like an SRT file but instead of embedded text, there would be a pointer to an image file.

    Where would this be used? Imagine an audio lecture. An audio file is far more compact than a video file, but it would still be nice to be able to display images, drawings, maps, etc. at the appropriate time points and for the appropriate duration.

    If anyone knows of an audio player that includes this function, OR can tell me how to code it in, say, HTML5, I would be very thankful.

  • Luis says:

    How can i translate de subtitles with mouse over? I saw it.

  • Join the discussion.

    Some HTML is ok

    You can use these tags:
    <a href="" title="">
    <abbr title="">
    <b>
    <blockquote cite="">
    <cite>
    <del datetime="">
    <em>
    <i>
    <q cite="">
    <strong>

    You can also use <code>, and remember to use &lt; and &gt; for brackets.