Document Outlines

by .

Updated 21/01/2014 by Dr. Steve

Document outlines have changed a bit in HTML5. For a start, they’re actually in the spec (and have been for years – (2008). The HTML5 Doctor is here to explain what document outlines are, how to make good ones, and why you should care.

What are document outlines?

The document outline is the structure of a document, generated by the document’s headings, form titles, table titles, and any other appropriate landmarks to map out the document. The user agent can apply this information to generate a table of contents, for example. This table of contents could then be used by assistive technology to help the user, or be parsed by a machine like a search engine to improve search results. The outlining algorithm has been clearly defined in the HTML5 spec, so once all browsers and assistive technologies play ball, there will be some major accessibility wins (more on support later). Before we take a look at how this new algorithm works, it’s time for a quick walk down memory lane.

Outlines in HTML4

Creating document outlines prior to HTML5 was simple. You had six heading elements, <h1> through <h6>. Lower-numbered headings were of a higher rank of higher-numbered ones — i.e. <h1> was ranked higher than <h2>:
<h1>My fantastic site</h1>
<h2>About me</h2>
<p>I am a man who lives a fascinating life. Oh the stories I could tell you...</p>
<h3>What I do for a living</h3>
<p>I sell enterprise-managed ant farms.</p>
<h2>Contact</h2>
<p>Shout my name and I will come to you.</p>
This example would produce the following outline:
  1. My fantastic site
    1. About me
      1. What I do for a living
    2. Contact
The <h2> titles are children of the <h1>, and the “About me” content has a further sub-heading using an <h3>. It’s simple but restrictive, as you have to ensure the heading levels are appropriate for the intended structure, and you’re limited to six levels. The latter restriction is usually not such a problem, but it still exists for all you heading fanatics (oh you guys!). HTML5 does this as well. The above example would produce the same outline, but it can be taken even further using the new sectioning elements.

Sectioning elements

Warning! The HTML5 document outline, in practical terms, is theoretical only, as it has not been implemented in user agents, so people who make use of heading semantics get the heading level as per the h1-h6 elements (HTML 4 outline) i.e. sectioning level is ignored.

The concepts behind HTML5 document outlines are actually older than you might think! Tim Berners-Lee posted to the www-talk mailing list back in 1991 (props to Dr Oli for digging that up), suggesting something quite close to what is demonstrated in this article. The sectioning elements <section>, <article>, <aside> and <nav> can all help to create a more logical structure in the document outline. Let’s go crazy and rewrite our previous example using only <h1> elements for headings:
<h1>My fantastic site</h1>
<h1>About me</h1>
<p>I am a man who lives a fascinating life. Oh the stories I could tell you...</p>
<h1>What I do for a living</h1>
<p>I sell enterprise-managed ant farms.</p>
<h1>Contact</h1>
<p>Shout my name and I will come to you.</p>
The outline would now look like this:
  1. My fantastic site
  2. About me
  3. What I do for a living
  4. Contact
Clearly, that’s no good — we’ve lost our structure! With sectioning elements, we can make it look like before without changing those headings. In this particular example, I think <section> is most appropriate:
<h1>My fantastic site</h1>
<section>
  <h1>About me</h1>
  <p>I am a man who lives a fascinating life. Oh the stories I could tell you...</p>
  <section>
    <h1>What I do for a living</h1>
    <p>I sell enterprise-managed ant farms.</p>
  </section>
</section>
<section>
  <h1>Contact</h1>
  <p>Shout my name and I will come to you.</p>
</section>
Run it through the outliner and we’re back to normal:
  1. My fantastic site
    1. About me
      1. What I do for a living
    2. Contact
But why? The sectioning elements act quite literally as their name suggests: they define sections of the parent element. These sections can be thought of as child nodes whose headings fall under their parent heading, regardless of their rank. The following example illustrates this further:
<h2>HTML5 Doctor articles</h2>
<article>
  <h1>The section element</h1>
  <p>We doctors are a bunch of chums using HTML5 and writing about how we do it...</p>
</article>
<article>
  <h1>The article element</h1>
  <p>We’ve discussed a lot of new elements here at HTML5Doctor...</p>
</article>
Even though the articles contain <h1>s, this produces the following outline:
  1. HTML5 Doctor articles
    1. The section element
    2. The article element
Equally, owing to how the outliner works, the following examples (while probably not the best use of headings) produce the exact same above outline:
<h1>HTML5 Doctor articles</h1>
<article>
  <h3>The section element</h3>
</article>
<article>
  <h3>The article element</h3>
</article>
<h2>HTML5 Doctor articles</h2>
<article>
  <h2>The section element</h2>
</article>
<article>
  <h2>The article element</h2>
</article>
<h6>HTML5 Doctor articles</h6>
<article>
  <h1>The section element</h1>
</article>
<article>
  <h1>The article element</h1>
</article>
When choosing which heading to use in your documents, the spec has recommendations:
Sections may contain headings of any rank, and authors are strongly encouraged to use headings of the appropriate rank for the section’s nesting level.

Note: due to the lack of support in browsers for the document outline and negative prognosis for future support,  strengthening of the current advice to a normative requirement  is currently under discussion (January 2014).

You should also make sure you’re aware of how differently ranked headings work when used as direct children of a sectioning element. It’s how it worked prior to HTML5:
The first element of heading content in an element of sectioning content represents the heading for that section. Subsequent headings of equal or higher rank start new (implied) sections, headings of lower rank start implied subsections that are part of the previous one. In both cases, the element represents the heading of the implied section.

Untitled sections

Sectioning elements that do not contain a child heading will be labelled as “Untitled”, indicating the lack of a logical heading but preserving the outline as in the example below:
<aside>
  <section>
    <h2>Twitter</h2>
  </section>
  <section>
    <h2>Recent comments</h2>
  </section>
</aside>
Running this through the outliner produces this outline:
  1. Untitled aside
    1. Twitter
    2. Recent comments
The outliner has taken the liberty of flagging the sectioning element as untitled, to act as a warning and to preserve a logical structure. For accessibility reasons, we recommend each sectioning element have a heading, even <aside> and <nav>, as shown below. If you don’t want these headings to be visible, you can always hide them with CSS.
<aside>
  <h1>What you're saying</h1>
  <section>
    <h2>Twitter</h2>
  </section>
  <section>
    <h2>Recent comments</h2>
  </section>
</aside>
Remember, elements like <section> should not be used arbitrarily. See our section article for more.

How does <hgroup> affect the outline?

<hgroup> is obsolete #

Refer to How to mark up subheadings, subtitles, alternative titles and taglines

As Dr Richard Clark said in our <hgroup> article, <hgroup> is all about the document outline. The outliner will disregard all headings within <hgroup> except the one with the highest ranking. For example, if an <hgroup> contains an <h2>, an <h3> and an <h4>, only the <h2>’s text will be used as the section title in the outline. At the time of writing, <hgroup>’s future is a little uncertain. It was recently removed and then returned to the HTML5 spec, and there are proposals for its removal or replacement with an alternative. We’ll be sure to keep HTML5 Doctor up-to-date with any changes as they unfold.

Sectioning roots

Sectioning roots, introduced in HTML5, isolate certain parts of a document to their own separate outlines. Headings within these elements will not show up in the main outline, where the sectioning root element is the <body>. The other sectioning root elements are <blockquote>, <figure>, <details>, <fieldset>, and <td>. Each one of these elements is a descendant of the <body> element, but its headings are removed from the top-level outline, instead starting its own isolated outline.
<h1>Top of the outline</h1>

<section>
  <h2>A heading in the outline</h2>
  <p>Lorem ipsum dolor sit amet...</p>
</section>
<section>
  <h2>Another heading in the outline</h2>
  <p>Lorem ipsum dolor sit amet...</p>
  <blockquote>
    <h1>This quoted heading will not appear in the outline</h1>
    <p>Lorem ipsum dolor sit amet...</p>
  </blockquote>
</section>
This results in the following outline. Notice that it lacks the <blockquote>’s heading, which has been isolated:
  1. Top of the outline
    1. A heading in the outline
    2. Another heading in the outline

Outlines in the real world

Unfortunately, there is little support for the new outlining algorithms right now. Search engines may be experimenting with it in their crawling algorithms as you read this, but as far as we know, headings are treated just as they were before. You won’t be penalised for using them, even if you use multiple <h1>s (which have always been okay as far as the spec is concerned). Check out our HTML5 and Search Engine Optimisation article for more on search engines and HTML5. At the time of writing, browsers and screen readers do not support these new outlines, so if you do use multiple <h1>s in your documents, it may confuse your users. It’s best if you use logical heading levels — <h1><h6> — at least until the new outlines are more widely supported. As for browsers, both recent releases of Firefox and Chrome have a user agent styles that support HTML5 document outlines. Try this bare-bones example in the latest Chrome or Firefox.

Update 21/01/2014

There is still no implementation of the document outline semantics in browsers apart from CSS styling. Refer to this recent article about  The HTML5 Document Outline.

Final thoughts

Despite the spotty support, it’s definitely worth thinking carefully about your document outlines so you’re prepared for the future, and tune in here for news of improved support. Get to grips with the sectioning elements and sectioning roots and how each affects the outline. When marking up a new site, consider how you could take advantage of the new document outline algorithm. As user agent support strengthens, pages you made with your new-found knowledge of document outlines will be more accessible. Let us know what you think in the comments below!

Outliner tools

In order to test your outlines, you’ll need an outliner. Here are a few options to get you started:
  • h5o, a Javascript implementation of the outliner, available as a bookmarklet, extension, or minified JavaScript file
  • An Opera extension
  • An online outliner where you can upload a file or submit a URL or HTML source to parse (may no longer be under development)

32 Responses on the article “Document Outlines”

Paul Irish says

Thx for covering outlines in the real world. :) IMO, it’s just not worth it to be caught up in the differences between html4 + 5 outlining and there are no consumers of the data.

I don’t expect any, either.

brad says

@Paul

Do you not think that screen-readers and/or search engines will become “consumers of the data” eventually? I would imagine the new HTML5 outline would allow them to more easily understand the data.

Adam Moore says

eventually? They probably already are. The APIs and JS libraries will be the real benefactors of these more exact tags, assuming developers standardize which tags get used for what type of content.

Dylan says

Thanks for the article Mike. Interesting stuff. I’m trying to write a little outliner at the moment (maybe it’d be useful as a bookmarklet or something one day). I’ve got it parsing all the samples in the article correctly (I think), but I’m a little unsure of what happens when there’s a mix of HTML4-style outlining and HTML5′s.

For example, I’ve added a heading to a previous example:


<section>
<h1>About me</h1>
<p>I am a man who lives a fascinating life. Oh the stories I could tell you...</p>
<h2>And other stuff</h2>
<p>Well, I like to surf.</p>
<section>
<h1>What I do for a living</h1>
<p>I sell enterprise-managed ant farms.</p>
</section>
</section>

The spec says:

Each section can have one heading associated with it, and can contain any number of further nested sections.

So, I take it the h2 wouldn’t become another heading associated with the <section>’s section.

Later it says:

If the element being entered has a rank lower than the rank of the heading of the candidate section, then create a new section, and append it to candidate section.

From this I’m thinking the h2 would have its own outline generated and appended as a new section to the <section>’s section. However, since the h2 isn’t a sectioning content or sectioning root element, its outline would consist of its next siblings.
What happens to the <section> that follows the h2? Is its outline added as a child section of this newly generated section for the h2, or as a sibling of the h2′s section (a child of the top <section>’s section)?

I’ve almost given myself a headache reading that algorithm in the spec. I’m sorry if it has rubbed off and caused me to write drivel above!

Oli Studholme says

@Dylan — It’s a sibling of the <h2>′s section (a child of the top <section>’s section)

  • About me
    • And other stuff
    • What I do for a living
Michel says

Thanks man… it really still confuse me…
Can you write about SEO aswell…
The aspect of having just one h1 per page…
How ist now?

Neil says

I’m finding this a little confusing, maybe its just down to the outliner tool but when I paste in the following example which you presented:

<aside>
  <section>
    <h2>Twitter</h2>
  </section>
  <section>
    <h2>Recent comments</h2>
  </section>
</aside>

[is this the code you were after? —ed.]

Comes out as:

1. Untitled Section
    1. Untitled Section
        1. Twitter
            2. Recent comments

Does the root node here belong to the body tag which has been added by the outliner or something like that?

Also looking at html on mozilla:

https://developer.mozilla.org/en/Sections_and_Outlines_of_an_HTML5_document

It says there that footer contributes to the document outline algorithm and they show that in their example, but in the outliner I don’t see the footer element there.

Alohci says

@Neil – I believe you are right. The root node is probably the body tag. I think this is a fault in the outliner tool, since the spec makes clear that DOM subtrees can be outlined without having to be founded on a sectioning root element, so there is no justification for adding in the body element.

The mozilla article also seems wrong. <header> and <footer> are not sectioning elements (though this is a common misunderstanding). I recommend reading either the HTML5 spec, or the excellent HTML5 Doctor articles on the two elements.

Oli Studholme says

@Nicolas Hoizey — can’t you make it scarier? :) While it’s a nice idea, the whole “<h1> everywhere” meme isn’t remotely practical until :heading(n) is specced and implemented. Nicole Sullivan covers the issue in detail in Don’t Style Headings Using HTML5 Sections.

@Neil — hopefully I’ve corrected your comment with the intended code. Gsnedder’s outliner is correct, just not as informative as it could be. Adding the code sample to a new HTML document and checking with the h5o outliner bookmarklet I get:

1. Untitled BODY
    1. Untitled ASIDE
        1. Twitter
        2. Recent comments

adding an <h1> I get:

1. This is the h1 I added
    1. Untitled ASIDE
        1. Twitter
        2. Recent comments

Our example was only for the outline generated by that code snippet, hence the confusion.

Also, that Mozilla Developer Connection article is surprisingly incorrect. The sectioning content elements are article, aside, nav, and section. It’s a wiki so hopefully someone will read this article and correct it for us soon ;)

[edit: gah, Alohci! foiled again!! =]

Tony says

One thing I’ve been confused about is why the initial H1 wouldn’t be part of the section…

The first example shows:

My fantastic site

Would this:

My fantastic site

Be wrong? Would that result in something different?

I guess I’m being thrown off because I would presume that the heading is FOR that section… but it seems like HTML5 outlines automatically associate the heading with the section/article that follows it?

Hopefully that makes sense?

Tony says

Sorry, I forgot the code tag:

First example was:

My fantastic site

Second example was:

My fantastic site

Manuel says

Very nice post indeed. Still crunching in my head how the sections work, but it’s pretty straightforward. People need to write that down and work with it a few times and they’ll get the point.

Above you said this:” If you don’t want these headings to be visible, you can always hide them with CSS.” in regards to aside. Hiding text via CSS isn’t seen seen as cloaking? That would hurt more than it would help.

Matt says

I too am concerned about cloaking…

Either using text-indent or a span to hide the title of that element works great but I can find no definite documentation to say that it will or won’t blackball my sites…

Anyone?

juanfevasquez says

Thanks a lot for this post!!! I´ve been reading a lot about the outline and finally I´ve found this post and the part which says “outlines in the real world”. Plain english, easy to understand!

lekko says

The more I learn the more curious about how the site details the setting of a site

Alastair says

I like the idea of being able to create useful document structures, using this method (how useful is debatable I suppose).

But it does bring about the problem of complicating CSS. Traditionally, I’ve always created default style for my h1 -h6 tags that can be used throughout the website to achieve a consistent style.

Using the “h1 anywhere” style would definitely complicate this process, and mean more CSS styles to over-ride the base styles.

Actually, I’ve just read the link Oli posted above, which advises using classes as a means to style your headings…

.h1{}
.h2{}
.h3{}
.h4{}
.h5{}
.h6{}

This feels wrong for me, it also brings about it’s own problems when a cms comes into play. I suppose you could use some JavaScript to insert relevant classes into every heading tag?

What are others opinions of .h1{} etc?

rajani says

Thanks for the informative article

Phoenix says

Hello. I have a question about adding a header to the element, example source:

<nav role=”navigation” aria-labelledby=”main-menu”>
<h2 id=”main-menu”>Main menu</h2>
<ul role=”menubar”>
<li role=”menuitem” tabindex=”0″><a href=”#”>Home</a></li>
</ul>
</nav>

If we are look at the code and going step-by-step, I remove the element the parent becomes Untitled nav. If I liive the element in the source I got an error about incorrectly ordered headings, but when I change it to I overuse the element. I am really confused here what would be the ideal step to do.

odra says

Hello, if you have a section element which wraps a single table element, should you use wrap a heading inside the table caption?
E.g:

Data important enough to have it’s own section

odra says

Oopz! Thought they would be ignored.
<section>
<table>
<caption><h3>caption goes here</h3></caption>
</table>
</section>

Simon Dell says

Thanks for the article. It’s good and clear.

Rant:
I was hoping to find something more inspiring regarding the uptake of this pattern. You’d think people would be more excited to use it. I can understand why they are not – legacy support is a big issue for many of the larger organisations involved; the semantics are confusing when you first come across them, and the work-arounds like .h1 {}, .h2 {} actually confuse things (despite making it easier to write CSS).

What still surprises me is that one obvious potential use-case of this mechanism is still not being talked about or used much: syndication. The outlining algorithm could successfully solve the problem of porting one document (or “) between different contexts (different sections of a site, different sites and tools etc). With the proliferation of aggregators and multiple output channels (websites, native apps) etc, you’d hope people would be pushing harder for wider adoption and better support for the new outlining.
:/Rant

Matthew Trow says

Always using in a section heading presents somewhat of a problem from a visual CSS perspective, making it an unrealistic option.

If a main section has an tag, with a specific font-weight and size, generally, each child section would have a visual representation that it’s a sibling – normally a smaller font size and/or have less weight.

Therefore it makes a great deal more sense to continue to use the heading tags h1 through to h6 appropriately.

Aaron Whiffin says

Brilliant guide and the flowchart image on your site a great little cheat sheet for when using these tags, many thanks.

Ove Nyström says

Hi!

I’m having trouble with the NAV element. I have the following structure:

[header]
-[h1]: Site Name
-[nav] (role navigation)
[article]
-[nav] (role presenation) (breadcrumbs)
-[header]
–[h1]: Page Name

I get the following outline:

1. Site Name
1. Untitled Section (nav)
2. Page Name
1. Untitled Section (nav)

I want to use html5 elements but to get the outline correct it seems I have to remove NAV elements and replace with DIV. The “role” of the html5 elements should also affect the outline, right? It seems for example that NAV role “presentation” should remove it from outline?

Is it possible to get a example for an entire site? Simplified off course but with all elements that are on a page including the html and body tags, navigation, main content, sidebar, comments, footer that also generates a correct outline that only shows relevant info.

Thanks for a great article!

    Ove Nyström says

    Sorry. Outline I get looks like this:

    1. Site Name
    —1. Untitled Section (nav)
    —2. Page Name
    ——1. Untitled Section (nav)

lukas says

@Ove Nyström
add header () to section’s (you can hidde it in css)

Jake says

How do we deal with modular content then?

We aren’t really making ‘pages’ anymore, its a collection of bits of content coming from all over, being assembled on the fly.

How are we to mark up all of these modules? Do we assign an arbitrary heading level?

    Steve Faulkner says

    Hi Jake, have in page logic that sets the appropriate heading level in context of use.

Kenneth says

Using <p> for a subheading is not semantic at all. It is important that a subheading be linked to the heading in some way. Plus, this use of <p> does not even conform to the standard’s definition of the tag. “A paragraph is typically a run of phrasing content that forms a block of text with one or more sentences…”

    Steve Faulkner says

    Hi Kenneth,
    The full text puts the lie to your statement about the p element:

    A paragraph is typically a run of phrasing content that forms a block of text with one or more sentences that discuss a particular topic, as in typography, but can also be used for more general thematic grouping. For instance, an address is also a paragraph, as is a part of a form, a byline, or a stanza in a poem.[emphasis mine].

Join the discussion.

Some HTML is ok

You can use these tags:
<a href="" title="">
<abbr title="">
<b>
<blockquote cite="">
<cite>
<del datetime="">
<em>
<i>
<q cite="">
<strong>

You can also use <code>, and remember to use &lt; and &gt; for brackets.